THE ANNALS 


of 
MATHEMATICAL 
STATISTICS 


FOUNDED AND EDITED BY H. C. CARVER, 1930-1968 
EDITED BY 8. S. WILKS, 1938-1949 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Georges Darmois, 1888-1960 , 
The Existence and Construction of Balanced Incomplete Block Designs 
Ham Hanan 361 
Random Allocation Designs II: Approximate Theory for Simple Random 
A. P. Dempster 387 
Sampling Moments of Means from Finite Multivariate Populations 
D. W. BEHNKEN 406 
On the Foundations of Statistical Inference: Binary Experiments 
ALLAN BirnBAuM 414 
Some Extensions of the Idea of Bias H. R. vAN DER VAarT 436 
Multivariate Correlation Models with Mixed Discrete and Continuous 
I, og aks < o:b oes 6d p Baas an WO I. OLKIN AND R. F. Tate 448 
Limits for a Variance Component with an Exact Confidence Coefficient 
W. C. Heaty, Jr. 466 
Confidence Sets for Multivariate Medians 
P. G. Horn anp E. M. Scurever 477 
Distribution Free Tests of Independence Based on the Sample Distribution 
Function. .J. R. Buum, J. Krerer, anp M. Rosensiatr 485 
Some Exact Results for One-Sided Distribution Tests of the Kolmogorov- 
Smirnov Type a 
Some Extensions of the Wald-Wolfowitz-Noether Theorem 
Jarostav Hisex 506 
The Gap Test for Random Sequences. .Eve Borincer anv V. J. Bortncer 524 
The Multivariate Saddlepoint Method and Chi-Squared for the Multi- 
SR I one 5 hy 0X nd igen bk ERS he 6 eae I. J. Goop 535 
A Generalization of Wald’s Identity with Applications to Random Walks 
H. D. Miuuer 549 
A Characterization of the Weak Convergence of Measures 
Rosert BartoszyNski 561 
Sas oe Bounds on the Probability of Error for a Discrete Memoryless 
1 Samue. Korz 577 


(Continued on back cover) 
A — 


Vol. 32, No. 2 — June, 1961 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


Subscription Rates. Current issues are $15 per volume (four issues of one calendar 
year) in the U. S. and Canada, $10 per volume elsewhere. Single issues are $4. Back 
numbers for all issues up to and including 1956 (Vol. XXVID are $12 per volume, $3.50 
per issue, $200 for the first 25 volumes, $10 per additional volume purchased at the same 
time as Volume I through XXV. 

Rates to members of the Institute of Mathematical Statistics are lower (see inside back 
cover). 

Communications concerning subscriptions, back numbers, payment of dues, etc., 
should be addressed to Gerald J. Lieberman, Treasurer, Institute of Mathematical Sta- 
tistics, Department of Statistics, Stanford University, Stanford, Calif. 

Communications concerning membership, changes of address, etc., should be ad- 
dressed to George E. Nicholson, Jr., Secretary, Department of Statistics, University of 
North Carolina, Chapel Hill, N. C. Changes of address which are to become effective for a 
given issue of the Annals should be reported to the Secretary on or before the 10th of the 
month preceding the month of issue. 

Editorial Office, Department of Statistics, Eckhart Hall, University of Chicago, Chi- 
cago 37, Illinois. William Kruskal, Editor. (See note on next page about change of Editor.) 

Preparation of manuscripts. Manuscripts should be submitted to the editorial 
office. Each manuscript should be typewritten, double spaced, with wide margins at sides, 
top, and bottom, and the original should be submitted with one additional copy, on paper 
that will take corrections. Dittoed or mimeographed papers are acceptable only if com- 
pletely legible. Footnotes should be reduced to a minimum, and where possible replaced by 
remarks in the text, or a bibliography at the end of the paper; formulae in footnotes should 
be avoided. References should follow current Annals style, and should be numbered alpha- 
betically according to authors’ names. 

Figures, charts, and diagrams should be professionally drawn on plain white paper or 
tracing cloth in black India ink twice the size they are to be printed. 

Authors are asked to keep in mind the typographical difficulties of complicated mathe- 
matical formulae. The difference between capital and lower-case letters should be clearly 
shown; care should be taken to avoid confusion between such pairs as zero and the letter O, 
the numeral 1 and the letter 1, numeral 1 used as superscript and prime (’), alpha and a, 
kappa and k, mu and u, nu and v, eta and n, etc. Subscripts or superscripts should be 
clearly below or above the line. Bars above groups of letters (e.g., log x) and underlined 
letters (e.g., x) are difficult to print and should be avoided. Symbols are automatically 
italicized by the printer and should not be underlined on manuscripts. Boldface letters may 
be indicated by underlining with a wavy line on the manuscript; boldface subscripts and 
superscripts are not available. Complicated exponentials should be represented with the 
symbol exp. In writing square roots the fractional exponent is preferable to the radical 
sign. Fractions in the body of the tert (as opposed to displayed expressions) and fractions 
occurring in the numerators or denominators of fractions are preferably written with the 
solidus; thus (a + 6)/(c + d) rather than = 

Authors will ordinarily receive only galley proofs. Fifty reprints without covers will be 
furnished free. Additional reprints and covers will be furnished at cost. 


Mail to the Annals of Mathematical Statistics should be addressed to either the Editor or the Treasurer, as de- 
scribed above. It should not be addressed to Waverly Press. 
CoMPOsSED AND PRINTED AT THE 
WAVERLY PRESS, Inc., Battrmore, Marruanp, U.S. A. 
Second-class postage paid at Baltimore, Maryland 





EDITORIAL STAFF 


EDITOR 
WILLIAM KRUSKAL 


ASSOCIATE EDITORS 


ALLAN BIRNBAUM DONALD A. DARLING OSCAR KEMPTHORNE 
DOUGLAS G. CHAPMAN WASSILY HOEFFDING 
W. S. CONNOR N. L. JOHNSON E. L. LEHMANN 


WITH THE COOPERATION OF 
J. R. Buum Cyrus DERMAN SAMUEL KARLIN J. W. Pratt 


R. C. Bose J. L. Doos SoLoMON KuLLBacK Howarp RaAIFFA 
D. L. BuURKHOLDER Meyer Dwass EUGENE LUKACs WALTER L. SMITH 
W.S. Connor D. A. S. FRASER INGRAM OLKIN LIONEL WEIss 


Past Epitrors OF THE ANNALS 
H. C. Carver, 1930-1938 T. W. ANDERSON, 1950-1952 
S. S. Wiiks, 1938-1949 E. L. LEHMANN, 1953-1955 
T. E. Harris, 1955-1958 


Published quarterly by the Institute of Mathematical Statistics in March, 
June, September and December. 


CHANGE OF EDITOR. New manuscripts arriving after July 1, 1961 should be 
submitted to the incoming Editor, J. L. Hodges, Jr., Department of Statistics, 
University of California, Berkeley 4, California. Correspondence about old man- 
uscripts should be directed to W. H. Kruskal. 


IMS INSTITUTIONAL MEMBERS 


ABERDEEN PROVING GROUNDS, BALLIsTIC RESEARCH LABORATORIES, Aberdeen, Maryland 

AEROJET-GENERAL CORPORATION, P. O. Box 296, Azusa, California 

AMERICAN ViscosE CorPORATION, Marcus Hook, Pennsylvania 

ATLANTIC REFINING Company, 2700 Passyunk Avenue, Philadelphia, Pa. 

BELL TELEPHONE LABORATORIES, INc., TECHNICAL LIBRARY, 463 West Street, New York 14, 
New York 

BENbD1Ix AVIATION CORPORATION, 1200 Fisher Bldg., Detroit, Michigan 

BoEING AIRPLANE CoMPANY, Box 3707, Seattle, Washington 

CALIFORNIA RESEARCH CORPORATION, P. O. Box 1627, Richmond, California 

CasE INSTITUTE OF TECHNOLOGY, STATISTICAL LABORATORY, Cleveland 6, Ohio 

CaTHOLIC UNIVERSITY OF AMERICA, STATISTICAL LABORATORY, MATHEMATICS DEPARTMENT, 
Washington, D. C. 

C-E-I-R, Inc., 1200 Jefferson Davis Highway, Arlington 2, Virginia 

CoLuMBIA UNIVERSITY, DEPARTMENT OF MATHEMATICAL Statistics, New York 27, N. Y. 

CorRNELL UNIVERSITY, MATHEMATICS DEPARTMENT, Ithaca, New York 

Forp Motor Company, P. O. Box 2053, Dearborn, Michigan 

GENERAL ELeEctTric Company, Building C37, Room 248, Schenectady, New York 

GENERAL Motors CorpoRATION, RESEARCH LABORATORIES, Twelve Mile and Mound 
Roads, Warren, Michigan 

INDIANA UNIVERSITY, THE LiBRARY, Bloomington, Indiana 

INTERNATIONAL BUSINESS MACHINES CORPORATION, MATHEMATICS AND APPLIED SCIENCE 
LiBRARY, 1271 Avenue of the Americas, New York 20, N. Y. 

Iowa STaTeE UNIVERSITY, STATISTICAL LABORATORY, Ames, Iowa 

LOCKHEED AIRCRAFT CORPORATION, ENGINEERING LIBRARY, Burbank, California 

MIcHIGAN StTaTE UNIVERSITY, DEPARTMENT OF StaTistTics, East Lansing, Michigan 

MINNESOTA MINING AND MANUFACTURING COMPANY, APPLIED MATHEMATICS AND StTa- 
TIsTics, St. Paul, Minnesota 

Monsanto CHEMICAL Company, 800 North Lindbergh Blvd., St. Louis 66, Missouri 


(Continued on next page) 





NATIONAL CasH REGISTER COMPANY, RESEARCH DEPARTMENT, Main and K Streets, Day- 
ton 9, Ohio 

NATIONAL SEcURITY AGENCY, Fort George G. Meade, Maryland 

NORTHWESTERN UNIVERSITY, DEPARTMENT OF MATHEMATICS, Evanston, Illinois 

PRINCETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
SratTistics, Princeton, New Jersey 

Purpuge University LIBRARIES, Lafayette, Indiana 

OKLAHOMA STATE UNIVERSITY, DEPARTMENT OF MATHEMATICS, Stillwater, Oklahoma 

{aDIO CORPORATION OF AMERICA, R.C.A. LABORATORIES LiBRaARyY, Princeton, New Jersey 

.4MO-WOOLRIDGE CORPORATION, Los Angeles, California 

REMINGTON Ranp—UNivac Division, 315 Park Avenue South, New York 10, N. Y. 

SANDIA CORPORATION, Sandia Base, Albuquerque, New Mexico 

Socony Mosit O1t Company, Inc., 150 E. 42nd Street, New York 17, New York 

SouTHERN Meruopist UNIVERSITY, MATHEMATICS DEPARTMENT, Dallas 5, Texas 

Space TECHNOLOGY LABORATORIES, P. O. Box 95001, Los Angeles 45, California 

STANFORD UNIVERSITY, GIRSHICK MreMORIAL LiBRaARY, Stanford, California 

State University oF Iowa, Iowa City, Iowa 

UNION CARBIDE CORPORATION, 30 East 42nd Street, New York 17, New York 

Union Orn COMPANY OF CALIFORNIA, UNION RESEARCH CENTER, Box 76, Brea, California 

UNITED STATES STEEL CORPORATION LiBRARY, Monroeville, Penna. 

UNIVERSITY OF CALIFORNIA, STATISTICAL LABORATORY, Berkeley, California 

UNIVERSITY OF ILLINOIS, SERIALS DEPARTMENT, Urbana, Illinois 

UNIVERSITY OF MICHIGAN, DEPARTMENT OF MATHEMATICS, Ann Arbor, Michigan 

UNIVERSITY OF NORTH CAROLINA, DEPARTMENT OF Statistics, Chapel Hill, North Carolina 

UNIVERSITY OF PuERTO Rico, ScHOOL oF TropicaL MEDICINE, San Juan, Puerto Rico 

UNIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washington 

WESTINGHOUSE ELEcTRIC CoRPORATION, RESEARCH LABORATORIES, Pittsburgh 35, Penn- 
sylvania 

W. R. Grace anp Company, Researcu Division, Washington Research Center, Clarks- 
ville, Maryland 

W.R. Grace AND Company, DEWEY AND ALMY CHEMICAL Division, 62 Whittemore Avenue, 
Cambridge 40, Massachusetts 





GEORGES DARMOIS, 1888-1960 
By D. Dueus 
Institut de Statistique de Université de Paris 


The death of Georges Darmois on January 3, 1960, has profoundly saddened 
the many friends whom the President of the International Statistical Institute 
numbered in all parts of the world. His personality was an harmonious synthesis, 
in which no element clashed with another. Mathematical statistics has had 
many theoretical contributions from him: he was one of the first to establish the 
form of those probability laws permitting a sufficient statistic; and the rela- 
tionship often called the Cramér-Rao inequality, inspired by the work of Sir 
Ronald Fisher, can also be partly considered as introduced by Georges Darmois. 

In the domain of relationships between random variables, the influence of 
Georges Darmois has also been strong: a number of his publications have formed 
an important contribution to extensions of Spearman’s theory of a general factor. 

Darmois undertook the study of random functions of time as early as 1929, 


and thus was among the first to develop the field of stochastic processes. But 


these theoretical studies never diverted his attention from concrete examples. 


He was a great teacher. In particular, he believed, almost eccentrically in our 
times, that the first quality of a teacher is to make himself understood to as large 
an audience as possible. Thus he always avoided complications of language, and 
so his thoughts have influenced many persons outside the scientific world of the 
academy. According to Geary, he was principally responsible for the great 
strides in French applied statistics. Darmois taught at the Centre de Préparation 
aux Affaires of the Chamber of Commerce in Paris. His last efforts, before being 
carried away by a merciless illness, were at the Institut de Statistique of the 
University of Paris. The Institut de Statistique had been founded in 1923 by 
a group comprising the greatest names of French statistical thought: March, 
Huber, E. Borel, C. Colson, C. Rist, Rueff. The orientation that these pio- 
neers had given the Institut was essentially economic and demographic. While 
conserving this aspect of its activity, Georges Darmois added two other areas 
of interest, industrial applications of statistics and operational research. He 
leaves us at a time when, because of the extensions he himself achieved, the 
Institut is going to need new and larger quarters in order to accommodate a 
student body that has tripled in the last few years. 

His statistical thought had style and conviction; he was, moreover, an excellent 
mathematician, and it was as a mathematician that he made his scientific debut. 
He was led to explore the theory of relativity, where he leaves a deep influence. 
It was indeed, the Section of Astronomy of the Académie des Sciences to which 


he was admitted in 1955. 


Received April 27, 1960. Invited obituary 





358 D. DUGUE 


Georges Darmois was truly French. He was what we call “tun homme de 
l’Est.”” Born at the foothills of the Vosges, he liked to explain that his name came 
from the land of the Armoises, on which his ancestors had labored and which 
had been, in the fifteenth century, the home of the false Jeanne d’Arc, Jeanne 
des Armoises. Far from thinking that his intellectual eminence separated him 
from his fellow countrymen, he valiantly performed his military duties in 1914, 
as again in 1939, when, with simplicity, he rejoined the military unit correspond- 
ing to his mobilization number. 

But his patriotism did not prevent him from travelling to other nations. One 
of his greatest joys was his election, in 1953, as President of the International 
Institute of Statistics, a position that he held until his death. He was, in French 
University circles, one of those who most welcomed colleagues from abroad, 
aided by the smiling amiability of Madame Darmois. How many illustrious 


statisticians have I met at his home, near the Sorbonne, in the Odéon quarter, 
where the principals of the French revolution lived: Danton, Camille Desmoulins, 
Billaud Varennes? I think of the 1937 meeting of Sir Ronald Fisher (then simply 


Professor R. A. Fisher, F.R.S.) with Laugier, President of the Société Frangaise 
de Biotypologie, and a Professor at the Sorbonne, who later became Adjoint 
Secretary General of the U.N.O.; I think of Mahalanobis, of Geary, Neyman, 
Kolmogorov, and Gini. 

It is in this harmonious setting that I shall always imagine my master, dis- 
cussing, with his jovial good-will and his deep competence, the latest youthful 
work that I had come to submit to him for presentation to the Comptes Rendus 
of the Académie des Sciénces. For he was always passionately concerned with 
youth and with the efforts of young scientists to extend the scientific patrimony 
they had received. That is why his memory will be held in filial devotion by those 
of us who had the great privilege of knowing him. 


Bibliography of Georges Darmois 
(The notation C. R. is an abbreviation for Comptes Rendus throughout.) 


‘‘Sur les correspondances 4 normales concourantes,”’ C. R., Vol. 151 (1910), pp. 431-434 
‘“‘Sur les courbes algébriques 4 torsion constante,’’ C. R., Vol. 157 (1913), pp. 1379-1382 
| “Sur les courbes 4 torsion constante,’’ Bull. des Sci. Math., 2nd Ser., Vol. 38 (1914), pp 
154-157 
‘“‘Sur la méthode de Laplace,’’ C. R., Vol. 158 (1914), pp. 546-549 
] ‘Sur les courbes algébriques 4 torsion constante,’’ Ann. Faculté des Sciences de l’Uni 
versité de Toulouse, 3rd Ser., Vol. 11 (1921), pp. 67-189. Doctoral thesis 
“Sur lintégration locale des équations d’Einstein (probléme extérieur),’’ C. R., Vol 
176 (1923), pp. 646-648 
“Sur l’intégration locale des équations d’Einstein (probleme intérieur),’’ C. R., Vol 
176 (1923), pp. 731-733 
‘“‘Surles probléme intérieur dans le cas d’un espace temps courbe a symétrie sphérique,” 
C. R., Vol. 177 (1923), pp. 1276-1278 
‘“‘Eléments de géométrie des espaces: Introduction aux théories de la relativité géné 
rale,’’ Ann. Physique, 10th Ser., Vol. 1 (1924), pp. 5-88 
“Etude théorique et expérimentale du fluxmétre”’ (with G. Ribaud), Ann. Physique, 
10th Ser., Vol. 1 (1924 , pp 173-212 





GEORGES DARMOIS, 1888-1060 359 


11} Les Equat ons de la Gravitation Einsteinienne, Mémorial des Sciences Mathématiques, 
No. 25, Gauthier-Villars et Cie., Paris, 1927 
12] Statistique Mathématique, Encyclopédie scientifique appliquées (with preface by Michel 
Huber, French Director of General Statistics), Ist ed., Doin et Cie., Paris, 1928 
‘‘La construction de Huygens et la théorie mécanique de la propagation des ondes” 
with F. Croze), J. Physique, Vol. 8 (1927), p. 17 
(14, 15] ‘“‘Sur l’analyse et comparaison des séries statistiques qui se développent dans le 
temps’’ (The time correlation problem), (Congrés international des mathé- 
maticiens, Bologne, 1928.) Metron, Vol. 7, No. 4 (1928), pp. 211-250, and Rev. Inst. 
Internat. Stat., Vol. 8 (1929), pp. 3-42 
structure et les mouvements de l’univers stellaire,’ Actualités Scientifiques et 
Industrielles, Vol. 17, Hermann et Cie, Paris, 1930 
méthode statistique dans les sciences d’observation,’’ Ann. Inst. Henri Poincaré, 
Vol. 3 (1932), pp. 191-228 
théorie Ejinsteinienne de la gravitation. Les vérifications expérimentales,” 
Actualités Scientifiques et Industrielles, Vol. 43, Hermann et Cie, Paris, 1932 
déformation de l’espace dans la théorie de la relativité,’’ C. R., Vol. 194 (1932), pp 
2269-2271 
‘“‘La déformation de l’espace dans la théorie de la relativité,’’ C. R., Vol. 195 (1932), pp 
20-21. 
‘Distributions statistiques rattachées 4 la loi de Gauss et la répartition des revenus,”’ 
Econometrica, Vol. 1 (1933), pp. 159-171 
‘“‘La recherche des régularités statistiques et leur interprétation,’’ Bull. Soc. Bioty- 
pologie, Vol. 1 (1933), pp. 1-10 
Statistique et Applicutions, 1st ed., Armand Colin, Paris, 1934; 4thed., Armand Colin, 
Paris, 1952 
La théorie des deux facteurs de Spearman,”’ (. R., Vol. 199 (1934), pp. 1176-1178 
‘‘La théorie des deux facteurs de Spearman,”’ C. R., Vol. 199 (1934), pp. 1358-1360 
“‘Sur les lois de probabilité 4 estimation exhaustive,’ C. R., Vol. 200 (1935), pp. 1265- 
1266 


La statistique appliquée 4 la psychologie. Conférence faite au Centre International de 
Synthése. Semaine de la Statistique, 1935. Publications du Centre de Synthése 

‘‘Les méthodes d’analyse factorielle: Analyse des corrélations,’’ Bull. Soc. Biotypologie 
Vol. 3 (1935), pp 15-57 

‘“‘L’emploi des observations statistiques. Méthodes d’estimation,” No. 356, Actualités 
Scientifiques et Industrielles, Hermann et Cie, Paris, 1936. 

‘Résumés exhaustifs d’un ensemble d’observations,’’ Bull. Inst. Internat. de Stat., 
Vol. 29 (1936), p. 131 

‘“‘Sur ‘l’indétermination’ du facteur général dans la théorie de Spearman,’’ Mathematica, 
Vol. 12 (1936), pp. 211-216. 


‘““Mathématiques et statistique au Service de ]’Economique,’’ Conférence faite au 


’ 


Centre Polytechnicien d’Etudes Economiques et publiée par ce centre, 1937. 
(Only a few copies printed 
‘‘Sur le rendement des observations statistiques,’’ J. Soc. Stat. Paris, Vol. 78 (1937), pp 
310-319 
Les Mathématiques de la Psychologie. Mémorial des Sciences Mathématiques, No. 98 
Gauthier-Villars, Paris, 1940 
[35] ‘Sur certains lois de probabilité,’’ C. R., Vol. 222 (1946), pp. 164-165 


’ 


ose 


[36] ‘“‘Résumés exhaustifs et probléme du Nil,’’ Note présentée le 15 juillet 1942, parues 
en 1946, C. R., Vol. 222 (1946), pp. 266-268 
[37] ‘‘Sur les limites de la dispersion de certaines estimations,’’ Rev. Inst. Internat. Stat., 


Vol. 13 (1945), pp. 8-15 





360 


‘Analyse des liaison > pI ibilité,”’ 
pp. 231-240 
‘‘Sur certaines formes de liaisons de probabilité,’’ 
tions, Vol. 13 (1949), pp. 19-21 
‘‘Dispersion et liaison stochastique,’’ Congres de Philosophie des Sciences, Paris, 1949. 
Conférence au Séminaire International de Statistique de Genéve. 1949. (Résumé 
paru dans la Rev. Inst. Internat. Stat. 
yn A l’unité des expressions du principe de Huyghens pour les ondes électro 
magnétiques,’’ C. R., (with F. Croze), Vol. 228 (1949), pp. 824-826 
“Sur une proprieté caractéristique de la loi de prol ] de Laplace 
222 (1951), pp 1999-2000 
‘Sur diverses propriétés caractéristiques de la loi de probabil 
Bull. Inst. Internat. Stat., Vol. 33 (1951), pp. 79-82 
‘Analyse generale de liaisons, théor®mes généraux,’’ Rev. Inst. Inter 
21 (1953), pp. 2-8 
‘‘Sur l’estimation des grandeurs par leurs mesures,’ iaire de Bureau 


Gauthier-Villars et Cie, Paris, 1952 


‘““Mathématiques et Statistique,’’ Réunion Internationale de Rome 


égression. Résultats récents et problémes non résolus,” ( 


relles, Masson et Cie, Paris, 1955 pp 9-25 


bilités,”’ Vol. II, Sect. 10 (1955), pp. 4 


les proba PI 


vitation—la relativité générale verifications experimentales 
18 (1955), pp. 7-11, in Encyclopédie Frangaise, Larousse, Paris 
“Observations théoriques sur l’analyse factorielle, lineaire et 


Internat. d’Ana cto se 55, pp 295-301 





THE EXISTENCE AND CONSTRUCTION OF BALANCED 
INCOMPLETE BLOCK DESIGNS’ 


By Haim HANANI 


Technion, Israel Institute of Technology, Haifa, Israel 
1. Introduction. Given a set EF of v elements, and given positive integers, 
k,l (lS k S v) and X, we understand by a tactical configuration Clk, l, d, v] 
(briefly, configuration) a system of subsets of £, having k elements each, such 
that every subset of EF having / elements is contained in exactly \ sets of the 
system. 


A necessary condition [13, 9] for the existence of a configuration C{k, l, A, v] 
. j 


is that 
CC oiileidibaiiie 01 
Ma, i" i faa cali 
, v k\ . ' aaa : 
Clearly, » (7) /(*) is the number of elements of Clk, 1,\,v] and 
v—/h k—h 
. (; ~ fC — ) 


is the number of those elements of C[k, 1, \, v] that contain A fixed elements of E. 
A balanced incomplete block design (BIBD), Blk, X, v], (k S v) is a configura- 
tion Clk, 2, A, v] with 1 = 2. The elements of B[k, A, v] are called blocks. 


In the usual terminology, a BIBD is an arrangement of v elements in b blocks 


of k elements each so that every element occurs in r blocks and every pair of 
elements occurs \ times in all [8]. 


From (i) follows: 
A necessary condition for the existence of a BIBD is 


li A(v — 1 O(mod (k — 1 and dv(v — 1) = O(mod k(k — 1) 


In the sequel we shall consider (11) as a condition on v for fixed k and X. 

Steiner triple systems [17] are BIBD with k = 3, \ = 1. It has been proved by 
Reiss [15] and by Moore [12] that in this case condition (ii) is also sufficient for 
the existence of a BIBD. Bose [1] proved that condition (ii) is also sufficient in 
the case k 3, A 2. 

On the other hand, there are known cases in which condition (ii) is not suffi- 
cient. A BIBD with k = n+ 1,A landv = n° +n + 1isa finite projective 
plane of order n. For such planes condition (ii) is clearly satisfied; it was how- 


teceived February 23, 1960; revised August 10, 1960 

1 The work on this paper was done at the Mathematics Research Center, United States 
Army, Madison, Wisconsin, sponsored by the United States Army, under Contract No 
Da-11-022-ORD-2059 


361 





362 HAIM HANANI 


ever, already conjectured by Euler [5], and proved by Tarry [20], that no pro- 
jective plane of order n = 6 exists. Bruck and Ryser [2] have proved moreover 
that no finite projective plane exists if n = 1 or 2 (mod 4) and the square-free 
part of x contains at least one prime factor of the form 4m + 3. 

The purpose of this paper is to prove that condition (ii) is sufficient for the 
existence of a BIBD for k = 3 and 4 (and every A) and also for k = 5, = 1,4 
and 20.” The proof is given by induction on v for any pair of fixed values of k and 
\, and it enables effective construction of the designs. The induction works also 
for larger values of k, but in these cases the existence of designs for initial values 
of v remains undetermined. 

Tactical configurations C[k, /, 1, v] with X = 1 have been introduced by Moore 
[13] as tactical systems S[k, /, v]. From (i) it follows that a necessary condition 
for the existence of a system S[k, J, v| is that 


v—h k—h 
(iil) [ i. /¢ - 4 = integer, hk = 0,1,--:-,t— 1. 


This condition again is not always sufficient, as the nonexistence of a finite pro- 
jective plane of order n = 6 shows. So far, it has been proved that (iii) is suffi- 
cient for k 3, | = 2 (the mentioned Steiner triple systems) and for k = 4, 
1 = 3 [9]. In the present paper, sufficiency of (iii) is also proved for k = 4, 
1 = 2, (6.5), and for k = 5,1 = 2,° (7.10). No other general sufficient conditions 
on the existence of systems S[k, /, v] are known so far; the special cases of systems 
known to exist may be found listed in [23]. 

lor some detailed information on incomplete balanced block designs and for 
bibliography, see the excellent survey by Hall [8]. 

Considering the rather tedious proofs of combinatorial character a special 
subdivision into sections has been adopted. Every subsection denoted by two 
figures consists of one of the following: a definition (e.g., (2.1) ), a theorem (5.1), 
a proposition (3.4), a lemma (5.3), or a proof of a part of a theorem (5.5). 
Some of these subsections contain auxiliary lemmas which for reference are 
denoted by three figures (e.g., (5.3.1. 


2. 7-systems. 
(2.1) Derrnition. Let a class of m mutually disjoint sets 7;,7 = 0,1, ---,m— 1 


having ¢ elements each be given. If it is possible to form a system of  m-tuples 


i.e., sets having m elements each) in such a way that 

(i) each m-tuple has exactly one element in common with each of the sets 

,»t = 0,1, ---,m — 1, and 

(ii) every two m-tuples have at most one element in common, then we denote 
the above system of m-tuples by 7'o[m, ¢]. 

The class of all numbers ¢ for which systems 7'o[m, t] exist will be denoted by 
T m 

? With the possible exception of B [5, 1 

Tbid. 





EXISTENCE AND CONSTRUCTION OF BIBD 363 
(2.2) Derrirtion. If a system 7'{m, t| exists and if moreover there are in the 
system at least e subsystems (0 < e S t) each consisting of ¢ mutually disjoint 
m-tuples, then we denote such a system by 7’,[m, ¢]. 

The class of all numbers ¢ for which systems 7',[m, t] exist will be denoted by 
T.(m). 

As a direct consequence of the definitions we obtain 
(2.3) Let a system Tm, t| (0 S e S t) be given and let A e7;, Be r;,1 <9, 
then there exists exactly one m-tuple of T [m, t| containing both elements A and B. 
(2.4) Ife 2d, then Tm) C Ta(m), i.e., t € T.(m) implies t ¢ Ta(m). 

(2.5) teT7T;(m) is possible only if t 2 m. 

We shall now prove 
(2.6) If tis a power of a prime, thent ¢ T,(t). 

For t = p* (p prime, @ a positive integer) finite projective planes PG|2, p*] 
have been constructed with ¢ + 1 points on a line [21]. Through every point in 
infinity go—besides the line in infinity—t otherwise mutually disjoint lines. 
Omit the line and the points in infinity and choose any ¢ mutually disjoint lines 
of the remaining Euclidean plane EG[2, p*| as the sets 7;,7 = 0,1, ---,¢— 1. 
The remaining lines form a system 7’,[t, t]; compare [19]. 

(2.7) If te Tm) and m 2 m, then alsot ¢ T,(me). 

This is obtained by omitting the m, — mz sets 7; ,7 = m2,m2+1,°°: ,m— 1. 
(2.8) Ifte Tm) ands ¢ Ta(m), then ts ¢€ T.a(m). 

Consider a 3-dimensional finite lattice of points with integral coordinates 
Osrsim-10Syst-—1,0528 8-1. In this lattice the m-tuples of 
T .\m, t] may be described as functions y = y,(x), h = 0,1, --- , @ — 1, and the 
m-tuples of T,[m, s] as functions z = z;(x),j = 0,1, --- ,s° — 1. For every pair 
of indices (h, 7) we form the m-tuple defined by the pair of functions y = y(2), 
z=2,(x),h =0,1,---,@—1,j7 = 0,1, ---, s’ — 1. Taking for 7; the planes 
x=1,i=0,1,---,m —1, it is easily verified that the conditions of the defini- 
tion (2.1) are fulfilled and thus the obtained m-tuples form a system T|m, ts]. 
In order to show that this system is a 7',4{m, ts], we remark that if the functions 
yY = Yr,(x), a = 0,1, ---,¢ — 1, are mutually disjoint and also the functions 
z = 2z;,(4), 8 = 0,1, --- ,s — 1, are such, then also the ts m-tuples given by the 
pairs of functions y = yn,(x), 2 = 2;,(2) are mutually disjoint. 

From (2.6) and (2.7) by repeated use of (2.8) follows: 

(2.9) Let t = pr'ps* --+ pr", where p; are primes and a; positive integers, 
i=1,2,---n.If pi = m,i = 1,2, ---,n, thent e T.(m). 

Proposition (2.9) is equivalent to the theorem proved by MacNeish [10] and 
later by Mann [11] that under the conditions of (2.9) there exist at least m — 1 
mutually orthogonal Latin squares. 

(2.10) te Ti(m — 1) @f and only if t ¢ To(m). 

If t e¢ To(m), then every element of 7, _, belongs to ¢ otherwise mutually 
disjoint m-tuples. By omission of 7,1 we thus obtain the required system 
T .{m — 1, t]. If on the other hand ¢t ¢ T,(m — 1), then, for every subsystem of ¢ 
mutually disjoint (m — 1)-tuples, we adjoin a fixed element to all the (m — 1)- 





364 HAIM HANANI 


tuples of such subsystem. Denoting by 7,1 the set of these additional elements, 
we obtain a system 7o{m, ¢}. 

From (2.10) and (2.5) follows: 

(2.11) te To(m) is possible only if t 2 m — 1. 
(2.12) IfteT,(m) ands ¢ To(m), then ts ¢ T,2(m). 

Consider a 3-dimensional finite lattice of points with integral coordinates 
Osrim-108Sys8t-—-1,0 828 8s —1. In this lattice denote by 
y = y; (x), i = 0,1, +--+, t — 1, the functions corresponding to the ¢ m-tuples 
of the jth subsystem of mutually disjoint m-tuples of T,|m, t],7 = 0,1,---,s —1, 
and by y = ya(x), h = 0,1, --- , & — ts — 1, the functions corresponding to 
the remaining m-tuples of T,[m, t]. By z = (x), k = 0,1, --- , s*’ — 1, denote 
the functions corresponding to the m-tuples of T'o|m, s]. Now form the pairs of 
functions 

(i) y = yi (2), 2 = ~% (x) +7 (mods), (¢ = 0,1,---,t — 137 0, 1, 
s—1;k =0,1,:::,s —1), 

(ii) y = y(x),2 = a(x), (R=0,1,---,f —ts—1;k =0,1,---,8° — 1), 
obtaining their values in the yz plane. These functions are m-tuples, any two 
of which have at most one element in common. Moreover for every fixed k, 
k = 0,1, ---, 8 — 1, the ¢s functions (i) are mutually disjoint, for different 
j’s are namely the functions z = z(x2) + 7 disjoint and for fixed j and different 
7’3 — the functions y = y;’ (x). 

From (2.10), (2.4) and (2.12) follows 
(2.13) Ifte Tim) and m —1 € Ty-1(m — 1), then ttm — 1) € Tym-zy2(m). 


3. B-systems. 
(3.1) Derrition. Let a set E having v elements be given; further let 
K = {k,\j-, be a finite set of integers 3 S k; S v, 7 = 1,2, ---,n, andra 
positive integer. If it is possible to form a system of blocks (subsets of /) in 
such a way that 

(1) the number of elements in each block is some k; ¢ K and 

(ii) every (unordered) pair of elements of FE is contained in exactly \ blocks, 
then we shall denote such a system by B[K, X, v}. 

The class of all numbers v for which systems B[K, X, v] exist will be denoted by 
B(K, Xd) 

If K = {k} consists of one number k only we shall write Blk, \, v] and B(k, Xd) 
instead of Bij k}, A, v] and B({k}, \) respectively. 

The systems B[k, \, v| are the BIBD introduced in Section 1. 
(3.2) Derrition. If a system B[K, \, v] exists and if moreover there exists 
an element A ¢ E and a number m ¢ K such that (m — 1) divides (v — 1) and 
the set HE — {A} can be split into (v — 1)/(m — 1) mutually disjoint subsets 
E;,7 = 1,2, ---, (v — 1)/(m — 1), each having (m — 1) elements, in such a 
way that each of the sets E; U {A}, 7 = 1,2, ---, (v — 1)/(m — 1), appears 
exactly \ times as a block in the system B[K, X, v|, then we denote such system 
by B,,[K, \, v], and the class of all numbers v for which systems B,,[K, \, v] 
exist by B,,(K, 2). 





EXISTENCE AND CONSTRUCTION OF BIBD 365 


(3.3) Derrriirtion. If a system Blk, \, v] exists and if moreover there exists a 
number m ¢ K such that m divides v, and the set FE can be split into v/m mutually 
disjoint subsets FL, 7 = 1, 2, ---,v/m, each having m elements and each appear- 
ing exactly \ times as a block in the system B[K, X, v], then we denote such 
system by B,[K, A, v}. 

As an immediate consequence of the definitions we have: 
(3.4) From v e B,,(K, d) follows v ¢ BCK, 
3.5) Fromv s By. K, d) follows v e BLK,» 

From v € B(k, 1) follows v € B,(k, 1 

If K’ C K then fromov ¢ B(K’, Xd) fullowsv e BLK,» 

If N isa t factor of X or if N’ | then from v € B(K, »’), v € BACK, N 
and v ¢ B,(K, ’) follow v e BLK, ),v ¢ B,C K, X) and v ¢ BL(K, X) re spectively. 
(3.9) Ifv e B(K’, dX’) and tf for every k’ ¢ K', k’ ¢ B(K, dX”), thenv ¢ B(K, d), 
where X = NN. 

We shall now prove the following proposition 

3.10) Lf. (m — 1)u + 1, where u € B(K’, \’) and if for every k’ ¢ K 

(m — 1)k’+1€B,,(K, X”), thenv © B,,(K, X), where X = NN". 

Consider a 2-dimensional finite lattice of points (x, y) with integral coordi- 
natesO0 Sx Su-—-1,0 8 y S m — 2 and a point A. The total number of 
points is clearly v. Denote 


’ 


iA,(t,y): OS ySm — 3. 

Now for every block 8 of the system B[K’, \’, u] consider the set U3 A, 1) 
On this set we may construct a system B(s) = B,{K, ”, (m — 1)8 + 1, 

(8 is the number of elements in 8) in such a way that each of the sets (A, 7), 
i ¢ B, appears in B(8) as block exactly \” times. We construct now a system 
B,{K, \, v] as follows: take all the blocks of all the systems B(@), 
3 ¢ BLK’, d’, u|,—except of the blocks (A, 7), 7 = 0, 1, ---,u — 1,—as often 
as they appear, and the blocks (A, 7), 7 = 0,1, ---, u — 1, X times each. It 
is easily checked that the number of elements in « re block is a number of 
K(m ¢ K by definition) and that each pair appears in exactly \ nal 

In the same way it can be proved: 
(3.11) Tf mu where u ¢ B(K’, \’) and if for every k’ ¢ K', mk’ & B),(K, X”) 
then v ¢ B,,( K, dX), where X = NN’. 

Putting in (3.10): K = {k} and m k we obtain 
(3.12 If 1 (Kk } + 1, where u ¢ B(R’, ’) and if for every be Ks 
(k—-1 + le Bk, d”), thenv € B,(k, X), where X = NN". 

Further we prove: 
(3.13 Let t, s,s + 1 e¢ B(K,1),t ¢ T,(s) andq e B(K, 1) org = 0 or 1; then 
u st+qeB(K,1 

Consider a 2-dimensional lattice of points with integral coordinates 
Os7st-10Syss-landOS2rxsq-l1,y Take all the s-tuples 


of 7T',[s, t]; there are among them q subsy stems of ¢ mutu: ally disjoint s-tuples each 
and we adjoin to all the s-tuples of the jth subsystem, 7 = 0,1, --- ,q — 1, the 





366 HAIM HANANI 


point x = j, y = s. We form now B[K, 1, u] taking the blocks of the gt systems 
BIK, 1, s + 1] on all so obtained (s + 1)-tuples, the blocks of the t(t — q) 
systems B|K, 1, s| on the remaining s-tuples of 7,,[s, t], and also all the blocks of 
the systems B[K, 1, ¢] on each of the lines y = 7,7 = 0,1, ---,s — 1, and if 
q > 1 — the blocks of the system B[K, 1, gq] on the line y = s. 

By the same proof we may obtain the more general result: 
(3.14) Lett,s,s + 1 ¢ B(K, 2), te T,(s) and q ¢ B(K, dX) org = 0 or 1; then 
u st+qeB(K,)). 

The following propositions may also be proved in a similar way: 
(3.15) Lett+1,s¢B(K, X) andt eT o(s), thenu = st+1eB(K, X). 
(3.16) Lett +1,s,s+1¢B(K,\),teT,(s) andq +1 B(K,\A) org = 0, 
thenu = st+q+1l1eB(K,)d). 


x 


4. Block designs with » = p’. 
(4.1) Let FE be a set of v p° elements (p prime, @ a positive integer). We 
may denote the elements of / as marks in a Galois field (see e.g., [3] pp. 242- 
288) and more specifically as polynoms >>%> a,c’, a; = 0, 1,---, p — 1; 
t = 0, 1, --- , a — 1. In order to shorten the notation we shall in the sequel 
denote such marks by (gq), 


o. i, 


Putting z* = 7s 0 cv’, where x* ) ca = O is an irreducible equation 
in the field and taking all coefficients modulo p, (for a = 1 we take for x a primi- 
tive root of p) we are able to reduce any polynom to a mark in the Galois field 
and in the sequel such reduction will always supposed to be performed. 

For v = p*, BIBD may in some cases be constructed in a simple way as the 
following propositions show (compare also [1, 6, 16]) 
(4.2) Ifv = p%, thenv € Blk, k(k —1 

The blocks are: 


l(gt2)(gt2),-*--,gt+2" yy = —— or 


Considering that g obtains the values of all the marks of the Galois field it is 
sufficient to show that for a fixed g each non-zero mark of the field appears 
exactly k(k — 1) times as difference between the elements of the blocks. Now for 
each pair of integers y,6,(0 SySk—1,05 625 k—1,y & 5) the differences 
(g+2°°7) —(g+a°") = x(x? — x’) runforB = 0,1, --- ,v — 2 through all 
the non-zero marks of our field. The number of the pairs y, 6 being k(k — 1) our 
assertion is proved. 

As a further check we remark that the number of blocks in the design should 
be Av(v — 1 k(k — 1)). In our ease A k(k — 1) and the number of blocks 
is as it should be »(v — 1). 

In the same way it may be proved: 

If i p, and q us the greatest common factor of (v — 1 and k, then 
(k, k(k — 1)/q 





EXISTENCE AND CONSTRUCTION OF BIBD 


The blocks are: 


f(g +ta°t"):4 = 0, (v — 1)/q, 2(v — 1)/q, -:- 


6=0,1,---,k/q-— 1}, B= Gf. «os 


(4.4) If v = p*, q is the greatest common factor of (v — 1) and k, and 2 is a 
common factor of (v — 1) and (k — 1), thenv € B(k, k(k — 1)/(2q)). 
The blocks are: 


(gg + 2°77): y = 0, (v—1)/q, 2(v — 1)/g, ---, (¢ — 1) (v — 1)/q; 
6=0,1,---,k/q -— 1}, B= 0,1, ---,@ — 1)/(2q) — 1. 


(4.5) ILfv = p* and q is the greatcst common factor of (v — 1) and (k — 1), then 
v € B(k, k(k — 1)/q). 
The blocks are: 


‘(g), (9g + at) 4 = 0, (0 — 1)/q, 2(v L)/q, °°, (qa — 1)(v — 1)/q; 
= 0,1,-°-,(k—1)/q-— ll}, 8=0,1,°--,(v—1)/q-1. 


(4.6) Ifv = p*, q is the greatest common factor of (v — 1) and (k — 1), and 2 
is a common factor of (v — 1) and k, then v ¢ Bik, k(k — 1)/(2q)). 


The blocks are: 
(9), lo + rr 
6 O,1,---,(«&—- ; = 0,1, ---, (v — 1)/(2q) — 1. 


5. Block designs: / a 
(5.1) THrorem. A necessary and sufficient condition for the existence of BIBD 
of v elements, with k 3 and any X ts that 

(i A(v — 1 O(mod 2) and nd (v — 1) 0(mod 6). 

Proor. The necessity of (i) follows from (ii) Section 1. It remains to prove 
its sufficiency. From (i) follows that 


if x 1 or 5(mod 6), then v= 1 or 3(mod 6); 

if X 2 or 4(mod 6), then v = 0 or 1(mod 3): 

if 3(mod 6), then v 1(mod 2); 

if ) 0(mod 6), there are no restrictions on v. 


Consequently y (3.8) it remains to be shown that 


(5.2) for every E oy 


1 1 or 3(mod 6 implies v ¢ B(3, 1 
i 0 or 1(mod 3) implies v ¢ B(3, 2), 
1(mod 2) implies v ¢ B(3, 3) 
and for every v, » € B (3, 6) holds 


The proof of (5.2) will be given with the help of the following lemmas: 
i , 2 2 ; rl yl P . 
(5.3) If u 0 or 1(mod 3) and u = 3, then ue B(K3,1), where Kz = {3, 4, 6}. 


The proof of this lemma is given by induction. Note that by (2.9), 





368 HAIM HANANI 


t ¢ T,(3) whenever ¢ 0, 1 or 3(mod 4) and by (2.13), ¢ e 74(3) when t 
2(mod 4) and t = 6. Consequently 3 ¢ 73(3) and for ¢ 1,¢¢ 7T,(3). Now for 
for u ¢ K; our proposition is trivial and for u = 7 we have: 
(5.3.1)" 7 « B(3, 1), (compare (4.4), the projective plane PG[2, 2 

Klements: 1), 1 o. i. 

Blocks: ~+3),0 * 

For other values of u, ie. u => 9 makes use of (3.13) putting K = K3,s 

and taking the values of q and t as follows: 


for 0(mod 9 
mod 9 
mod 9 
mod 9 
mod 9 


mod 9), J ee 


~ 
| 


5.4 If u => 3, then u ¢ B(K3, 1) where K3 1d, 4, 0, 6, 8, 11, 14}. 
The proof is again by induction and we make again use of (2.9) noting that 
te T,(3) whenever t 0, 1, or 3(mod 4) and that ¢t ¢ 7, (4) fort 1,5 and 7. 
Now for u ¢ K3 the proposition is trivial, for u 7 see (5.3.1) and for other 


values of u we insert in (3.13) K kK; and take the values of q, s and t asfollows: 


Ss 


(mod 12 


We now proceed Lo prove 5.2 
5 5 * If ] or 3 mod 6 Zs then. é B Se a see also 15. 12, 18] 


lor 3 this is trivial. For v 2 7 we may write 2 | where wu satisfies 
the conditions of (5.3). Putting in (3.12): 4 3, K’ 35 ” 1, it 


l -~ "2 l ‘ ‘ . rl 
remains by (5.3) and (3.6) to be shown that 2u +4 PB for u € K3. 


For u 3 see (5.3.1) and for u t and 6 we have: 
J.0.1)* Q9¢ 3, 1), (the Euclidean plane EG[2, : 
Elements: 
Blocks: 


o. i. 


The propositions denoted by * have been known. They are proved here for the sake of 


com} leteness and partly because the new method of proof seemed to be interesting 





EXISTENCE AND CONSTRUCTION OF BIBD 


(5.5.2)* 13 ¢ B(3, 1), (compare (4.4) ). 
Elements: (72), (¢ = 0,1, » RB): 
Blocks: {(i + 2°), (¢+ 2 oP) +2 )I, B = 0, 1. 

(5.6)* Ifv= Oor 1(mod 3), thenv e _ , 2), (see also [1]). 

Putting in (3.9): K’ = K3, tag 3}, 4’ = 1, ” = 2 the proposition follows 
from (5.3) provided that v ¢ B(: | ee v e K;. Forv = 3 this is trivial and 
for v = 4 and 6 we have: 

(5.6.1)* 4 ¢ B(3, 2), (compare (4.3) ). 

Elements: (9g), (g = do+ a,x; a; = 0,1;7 = 0, 1);2° =z+1. 
Blocks: {(g+2),(g+2'),(g+2°)}. 

(5.6.2)* 6¢ B(3,2 
Elements: (7,7), (¢ = 0,1, 2;7 = 0,1) 

Blocks: {(i, 7 + 1) (i, i), (¢ + 2°, 7)}, {(4, 0), (¢ + 2°,1), (¢ + 2’, 1) 
{(0, 0), ( 0), (2°, 0)}. 

(5.7) For everyv,ve B 3, 6) sik 

Putting in (3.9): K’ = K3, K = {3}, = 1,” = 6, it remains by (5.4) to 
be shown that v ¢ B(3, 6) forv e K’. For v = 3 this is trivial and for v = 4 
and 6 this follows from (5.6.1) and (5.6.2) respectively. For other values of v 
we have: 

(5.7.1) 5 e B(3, 3), (compare (4.5) ) 
Elements: (2), (¢ = 0,1, 2,3,4). 
Blocks: {(7), (i+ 2°), (¢+2°")} 

(5.7.2) 8 « B(3,6), (compare (4.2)). 
Elements: (g), (g = ado + qx + aor a5 = @. =0,1,2);2 =27+1. 
Blocks: {(g + 2°), (g + 2°*), (g + 2°*)} 

(5.7.3) 11 e B,(3,3) 

Put in (3.12): u = 5, K’ = {3}, k = 3, 
and (5.3.1) with (3.6). 

(5.7.4) 14 € B(3, 6) 
Elements: (2), (¢ = 0,1, 
Blocks: {(1 + 2°) mi 

i + 2), i+ 

(A), (2 7 rs. ti i 

(5.8) Ifv= 1(mod 2), then v ¢ B(3, 3). 

For v = 3 this is trivial, for v = 5 see (5.7.1). Forv = 7 we havev = 2u i. 
where wu satisfies the conditions of (5.4). Putting in (3.12): k = 3, K’ = 
\’ = 1,” = 3 it remains by (5.4) to be shown that 2u + 1 ¢ B,(3,3) for u 
Making use of (3.6) and (3.8) this follows for u = 3, 4, 5 and 6 from (5.3.1), 
(5.5.1), (5.7.3) and (5.5.2) respectively. For u = 8 we have 
(5.8.1)* 8 « B(4, 3), (see e.g. [3] p. 429 and [7]). 

Elements: (2,7), ( = 0,1, 2,337 0, 1.) 
Blocks: {(0, bo), (1, b1), (2, be), (3, b3)}, >b; = 0(mod 2), 
oa 0), (2, 1)}, Pee. 


’ 


, 


rie A). 
i 
2°) 


’ 


2 
3» 
72 
3 


5 The primitive marks throughout this paper are taken from [3] p. 262 and the primitive 
roots from [19} 





HAIM HANANI 


3) follows from (3.12) by putting u = 8, K’ = {4}, k a. 
l 


| and applying on (5.8.1) and (5.5.1). To prove 23 ¢€ B3(3, 3) 
u ik a, & 3, XN’ ae 1, then use (5.7.3) and 
. For u 14 we show 
(5.8.2 14 ¢ B({3, 4}, 3 
Elements: (2), (2 , 1, ---, 12) and (A) 
Blocks: {(¢ + 27), (¢ +27"), (¢ +2”), (¢ +27" 
((A), (6+ 2°), (6+ 2°), (6 + 2°)}, (6 +2! 
29 ¢ B;(3, 3) is obtained by putting in (3.12): u = 
r=2x 1 and applying to (5.8.2), (5.3.1) and (5.5.1). 


6. Block designs: / 1. 
(6.1) THEOREM. A necessary and sufficient condition for the existence of BIBD 
of v elements, with k = 4 and any d ts that 

(i) A(v — 1) O(mod 3) and Xvx(v — 1) = O(mod 12). 

Proor. The necessity of (i) follows from (11) Section 1. In order to prove its 
sufficiency we remark that from (i) follows that 


if \= 1 or 5(mod 6), then v 1 or 4(mod 12); 
if \= 2 or 4(mod 6), then v = 1(mod 3); 

if X= 3(mod 6), then v = 0 or 1(mod 4); 

if \= O0(mod 6), there are no restrictions on v. 


Consequently by (3.8) it remains to be shown that 
(6.2) forevery v = 4, 


2 1 or 4(mod 12) implies v e B(4, 1), 
v= 1(mod 3) implies v e¢ B(4, 2), 
v= Oor |(mod 4) implies v e B(4, 3) 
and for every 2, v e B(4, 6) holds. 


The proof of (6.2) is analogous to that of (5.2) and will be given with the help 
of the following lemmas: 

(6.3) If u 0 or 1(mod 4) and u = 4, then ue Bi Ki.1) where 
Ki = {4, 5, 8, 9, 12}. 

The proof is given by induction. Note that by (2.9), t e T.(4) if t # 2(mod 4) 
and t # 3 and 6(mod 9), and by (2.13), te T,(4) if t F 2(mod 4), t= 3 or 
6(mod 9) and t = 12; consequently t ¢ T,(4) for t = 4, 5 and 8, and ¢t e 7(4) 
if ¢ 0 or 1(mod 4) and t = 9. Now for u ¢ K; the lemma is trivial and for 
u = 13, 28 and 29 we have: 

(6.3.1)* 13 ¢ B(4, 1), (the projective plane PG[2, 3]). 
Elements: (2), (2 Oo: t. += . BB), 


Blocks: {(4 + 2°), (¢ + 2"), (¢ + 2°), (4 + 2°)}. 


3.2)* 28 ¢ B(4, 1), (see [1] 





EXISTENCE AND CONSTRUCTION OF BIBD 


Elements: 


Blocks: 


1 + 3, 0), 
r+ 3,1), 
1+ 6,2), (7+ 1,¢ 
r+ 6, 3) 
1+ 6,0), (2+ 1,2), 
1), (¢+6,1), (¢+ 4,3), (@ 
It may be of interest to note that in this design the 63 blocks form 9 groups of 7 
mutually disjoint quadruples each. 
(6.3.3) 29 « B({4, 5}, 1). 

Take 28 elements as in (6.3.2) and an additional element (A). Adjoin this 
element (A) to each of the 7 (mutually disjoint) quadruples 


2 b 


(2 


‘ 
i 
‘ 
' 
‘ 
t 
‘ 
' 
‘ 
t 
§ 
' 
‘ 
’ 
5 
' 
‘ 
t 


++te+++ 


bo Ww bo bo 
~ 


(2 


{(2, 0), (c + 6,1), (4 + 5, 2), (¢ + 3, 3)} 


of (6.3.2) thus forming 7 quintuples. These quintuples together with the remain- 
ing 56 quadruples of (6.3.2) form the required block design. 

For other values of u we make use of (3.13) putting K = Ki, s = 4 and 
taking for q and ¢ the values as follows: 
1 


su; 


1 
(u 


1,,- 
iu; 


for = (0(mod 16), u 
= 1(mod 16), u 
4(mod 16), u 


16, g= 
lj, q= 
20, g= 
5(mod 16), 
8(mod 16), 
9(mod 16), 
12(mod 16), 
13(mod 16), u 


‘ 1 
a, @ a(uU 


24, q= t(u 
25, @ 


44, g= 
49, g = 9, 


1 
aq 


1 
q\(U 


IV WV IV INV AVIV TV IV 


~n Se eo & S&H SH 


= 1(u 
(6.4) Ifu = 4, thenue Bi Ki, 1) where 
. 8, 9, 20, 11, 12, 14, 15, 16, 19, 22, Bai. 


The proof is by induction. We shall make use of (2.9) and especially of the 
fact that te T,(4) if t  2(mod 4) and t ¥ 3 and 6(mod 9), further ¢ ¢ T,(5) 
for t = 5, 8 and 13 and 9 ¢ 7,(7). For u ¢ Kj the proposition is trivial, for 
u = 13 see (6.3.1) and for u = 27 and 31 we have: 

(6.4.1)* 31 ¢B(6, 1), (the projective plane PG[2, 5]). 
Elements: ‘2), (¢ = 0,1, --- , 30). 
Blocks: {(¢ -+ 3°), (¢@+ 3'), @+ 37), (+ fi. (7+ Z”. (7+ a 
(6.4.2) 27 « B({4, 5, 6}, 1 
Delete from the block design (6.4.1) any 4 elements no 3 of which are 
“eollinear’’, e.g., the elements (27), (28), (29) and (30). 
For other values of uw make use of (3.13) putting K = Kj and taking for q; 





372 HAIM HANANI 


s and ¢ the values as follows: 


t 


— 64 

— 76 

92 

100 

116 

140 

- 172 

208 

256 

316 

388 

— 476 

— 580 
{ u(mod 144) 
45q85 147 


— 
> 


IIA IA 
IIA IIA 


aI 


IIA IIA 
Il 


N WAWA 
on oo be bo by 


5) 
ot 


o 
C0 SI Or SI or 
vuwove ow 


nh oe 


— 
—_ 0 


IIA IA WA HA HA HA 
NA HA HA MA IA TIA 
20 
WA WA WA HA WA HA HA HA HA IA HA HA IA 


IA IA WA WA WA HA HA HA TA WA TA IA IA 


Nok 
— et 
WO w 


INV 


We are now able to prove (6.2). 

(6.5) Ifv= 1 or 4(mod 12), then v ¢ B(4, 1). 

For v = 4 this is trivial. For v 2 13 we may put v = 3u + 1 where uw satisfies 
the conditions of (6.3). Putting in (3.12): k = 4, K’ = K4,’ = \” = 1 it remains 
by (6.3) and (3.6) to be shown that 3u + 1 ¢ B(4, 1) for we K:. For u = 4 
and 9 this is proved in (6.3.1) and (6.3.2) respectively and for u = 5, 8 and 12 
we have: 

(6.5.1)* 16 ¢ B(4, 1), (the Euclidean plane EG[2, 4]). 
Elements: (9,7), (g = ao + a2; a; = 0,1;7 = 0,1;7 = 0,1,2,3); 2° 
Blocks: {(0,7), (2°, 7), (2,7), (2, 7)}, {(g, 9), (g, 1), (g, 2), (9, 
{(g,0), (g + 2’, 1), (g+2°"',2), (9+ g. Shi. 
(6.5.2)* 25 ¢ B(4, 1), (see [1}). 
Elements: (9g), (g = ao + qz;a, = 0,1,: 
Blocks: {(g),(g + 2°’), (g+2°”), 
(65.3) Sie B(4, 1). 
Elements: (2), (¢ = 0,1, --- , 36). 
Blocks: {(2), (¢ + 2°), (¢ + 2°) (7 
(6.6) If v= Oor 1(mod 4), then v € B(4, 3 

Putting in (3.9): K’ = Ki, K = {4}, = 1,” = 3, this proposition follows 
from (6.3) provided that v ¢ B(4, 3) for v ¢ Ky. For v = 4 this is trivial and for 
v = 8 it is proved in (5.8.1). For other values of v we have: 

(6.6.1) 5 e€B(4, 3), (compare (4.3)). 
Elements: (72), (¢ = 0, 1, 2, 3, 4). 
Blocks: {(i + 2”), (¢ + 2"), (@ 

(6.6.2) 9 ¢ B(4, 3), (compare (4.3 


Elements: (9), (g = ao + aux; a, 





EXISTENCE AND CONSTRUCTION OF BIBD 


Blocks: | (¢ +2 ),(g+2), ” ial ,(g+a2°*° 
(6.6.3 We Bi 
Elements: oD jg ae. 5 ag DP 
Blocks: r+ 2,7), (¢@+2,7), 4 94+1), (+27 74+ 
yd), 442,79 ), (i+ 2',7 +2) 
se, OF, CO ce, 4 
6.7 For every viv é B(4, 6) holds. 

Putting in (3.9): K’ = Ky, K 14}, \’ = 1, A” = 6, the proposition follows 
from (6.4) provided that v e B(4, 6) for v e€ Ki. For v = 4 this is trivial and for 
v = 5, 8, 9 and 12 it follows from (6.6.1 
tively. For other values of v we have: 

6.7.1 6 « B(4, 6 
Elements: 3 
Blocks: {(2, 3 


5.8.1), (6.6.2) and (6.6.3) respec- 


(2,9), ( ey hs tae oe 
(6 42° 0). (6 + 2'.0). G 
6.7.2) 7 e B(A4, 2), (compare (4.6 
Elements: ), (4 2 i. 
Blocks: {(2 
6.7. l0e B 
Elements: 


Blocks: 


Elements: 
Blocks: ; 
6.43 l4teF 


Elements: 


Blocks: {(2,. ,(@+3,7), (+ 3,7), @+ 3,7)}, 5 times, 
, 0, 1, 2. 


6.7.6 
Elements: 
Blocks: 


(6.7.7 ISe Bi 


Elements: 
Blocks: 
g,0),(g+2°",0),(g +2" 


18 e¢ B(4, 6) follows from (3.9) with K’ 
applied to (6.6.1 





HAIM HANANI 


(6.7.8 yap 
Elements: (4, J, } = 0,1,2;7 = 0,1;h = 0,1, 2) and (A). 
proces: {(4), (¢,.7,0);, (8,9, 1), (4,3, 2); twice, 
(these blocks show that 19 ¢ B,), 
Q,5,h), (i +2),5,h), GIA), Gj,hk + 1}, 
(2+ 2 ds h), (a+ 2", 3, h), (@+ Zz .3 +1,h+1), 
(A,jg+1,h + 2)}, 
,(@,1,h), (@+2,0h +1), (6 +2,1,h + 1)}. 
(6.7.9) 22 
Put in (3.12): u = 7, K’ = ,k =4, 2, = | and apply to (6.7.2) 
and (6.3.1). 
(6.7.10) 23 e B(4, 6), compare (4. 
Elements: (72), (7 =0,1,-:-, 


Blocks: {(¢ + 5°), (@+5°" 


(6.8) Tf i 1(mod 3), thenve Bi 12 
For 2 t this is trivial, for v = 7 and 10 it is proved in (6.7.2) and (6.7.3) 
respectively. For v 2 13 we may put v = 3u + 1 where uw satisfies the conditions 
of (6.4). Putting in (3.12): k = 4, K’ = Ky,’ = 1,” = 2 it remains by (6.4) 
to be shown that 3u + 1 ¢ B,(4.2) for u « Ky. Now for u = 4, 5, 6, 7, 8, 9 and 
12 this follows from (6.3.1), (6.5.1), (6.7.8), (6.7.9), (6.5.2), (6.3.2) and 
(6.5.3) respectively; for u 10, 19 and 22 we put in (3.12): k = 4, K’ = {4}, 
’ = 2, d” 1 and apply to (6.3.1) and to (6.7.3), (6.7.8) and (6.7.9) respec- 
tively; for u 18 we put in (3.12): k 1, K’ 4, 5}, XY = 2,” = land 
apply to (6.7.7), (6.3.1) and (6.5.1). For other values of wu namely u = 11, 14, 15 
and 23 we have: 
(6.8.1 11 ¢ B(5, 2), (compare (4.4 
Elements: (72), (¢ = 0, 1, 
Blocks: {(7 + 2”), (¢ + 2”), (¢ + 2°), (7 
34 ¢ By(4, 2) follows from (3.12) with u 
\” = 1 applied to (6.5.1 
(6.8.2 43 € By(4, 2 
Elements: (7,7, h), (2 * ee, a Bs , 1,2) and (A 
Blocks: {(A), (2, 7,0), (7,7, 1), (4, 7, 2)}, twice, 
(these blocks show that 43 ¢ B, 
a”, 5,h), (6+ 3°" 5, h 


(6.8.3) 466 B, 


Elements: 





EXISTENCE AND CONSTRUCTION OF BIBD 375 


Diocks: {(A), (¢,.3; 9), (4,3, 1), (3, 23, twice, 
(these blocks show that 46 ¢ B,), 
(it jh) G+ j+iht+s), (¢+2°",7+2,h + 28), 
(i,j +2,8)}, 6 =0,1;8 = 0,1,2, 
((i,j,h), (@ +29, h), (@+2),5,h4+1), (6+ 27,97,h + 1)}. 
(6.8.4) 70 ¢ B,4(4, 2). 
Elements: (7,7), (¢ = 0,1, --- ,22;7 = 0,1, 2) and (A). 
Blocks: {(A), (7,0), (7, 1), (4, 2)}, 
(these blocks show that 70 € B,), 


(i+5°7), G+ 5° 97), G+ F741), G+ 5% 74+ DI, 
; B=0,1,---, 10. 


twice, 


7. On block designs k > 4. In this section we shall prove some general theo- 
rems which will enable us to show by induction the existence of BIBD for some 
given k and \ and an infinite set of values of v, provided that for some fixed finite 
subset of values of v such designs exist. 

To give some example we shall thereafter use those theorems for discussing 

the case k = 5. 
(7.1) Leta = 2,d = 2 and m 2 2 be integers and let R be a set of some residue 
classes modulo d with 0 ¢ R. Then there exists an integer n such that for every u 
satisfying ue R(mod d) and u 2 m, ue B(K(a, d, R; m, n), 1) holds, where 
K(a,d, R; m,n) = {a,a+ 1, r:xeR(mod d) andm S zx < n}. 

Let pi, 7 = 1, 2, ---, h, be the primes p; S aanda;,,i = 1,2, --- ,h, the 
smallest integers satisfying pi' = a; further let N be the smallest common 
multiple of [[‘_, p*‘ and d, and 6 the smallest integer satisfying 5V = m. We 
take n = a(a + 6)N + mand obtain the proof of our proposition by induction. 
For ue K(a, d, R; m, n) the proposition holds trivially and for u ¢ R(mod @), 
u = n we make use of (3.13) putting g= u(mod aN), m S q < m+ aN; 
s=a,t=a'(u—q)andK = K(a,d, R; m,n). The conditions of (3.13) are 
satisfied because by definition ae K and a + 1eK, further g = u(mod d) 
because d is a factor of N, alsom S q < m-+aN < nand consequently ge K. 
As for t we have t = q and by (2.9), t e T:(a), consequently by (2.4), t e T,(a); 
we have also t= O(mod d) ¢ R (mod d) and by induction assumption we may 
put te B(K, 1). 

In the sequel we shall use (7.1) with the values a = d = m = k, 6 = 1 ex- 
clusively. Now the set A(k, k, R; k, n) has a large number of elements and is 
therefore inconvenient. in applications. We can however by methods of Section 3 
and especially proposition (3.13) reduce this set to its subset 


K(k, R) C K(k, k, R;: k, n) 


with relatively few elements. We obtain thus from (7.1): 
(7.2 Let R be a set of some residue classes modulo k with 0 ¢ R. Then there exists 
a finite set K(k, R) of integers (which includes the integers k and k + 1 and whose 





376 HAIM HANANI 


all other elements belong to R(mod k)) such that for every u satisfying u ¢ R(mod k) 
and u =k, ue B( K(k, R), 1) holds. 

From (7.2) and from (3.9), (38.12) and (3.11) respectively we obtain (with 
notation of (7.2) ): 
(7.3 If for every k’ e K(k, R), k’ ¢ B(k, X) holds then for every veR(mod k c 
v ¢ B(k. dX) holds as well. 
(7.4) If (k — 1)u + 1 where ue R(mod k) and u = k and if for every 
k' e K(k, R), (k — 1) kb +16 B,(k, Xd) holds, then v € B,(k, X 
(7.5) If v ku, where ue R(mod k) and u = k and af for every k’ e K(k, R), 
kk’ ¢ Bu(k, X) holds, then v ¢ By(k, d). 
(7.6) We shall now use the obtained results for finding conditions under which 
BIBD with k = 5 exist. From (11) Section 1 follows that the necessary condition 
for the existence of such designs is 


A(t = O(mod4) and dv(v — 1) O(mod 20). 
For specific values of \ the necessary conditions imposed on v are accordingly: 


1) for Xv 1. 3. 7. @ 11. 13, 17 or 19(mod 2). 1 or 5(mod 20); 
(11) for X 2, 6, 14 or 18(mod 20), 1 or 5(mod 10): 
(11) for 1,8, 12 or 16(mod 20), 0 or 1(mod 5); 
(iv) for A 5 or 15(mod 20), 1(mod 4); 
(v) for A 10(mod 20), 1(mod 2); 
(v1) for A 0O(mod 20), very v. 
We shall show that in the cases (1), (iii) and (vi) the above necessary condi- 
tions are also sufficient.° By (3.8) it suffices to prove the following 


THEOREM. 


2 1 or 5(mod 20 implies v € B(5, 
F 0 or 1(mod 5 emplie sve B(5, 4) 
and for every v € B(5, 20) holds. 

This is proved in (7.10),’ (7.11) and (7.12) respectively. Regarding the case 
(iv) it shall be proved in (7.13) that 2 1(mod 4) implies v e B (5, 5), provided 
that v e Bs(5, 5) for fu + 1,ueK(5, 10, 1, 2,3, 4!), (see (7.9) ). Concerning 
the case (ii) it has been proved by Nandi [14] (see also [4, 7]) that no BIBD, 
B[5, 2, 15] exists which shows that in this case the necessary condition is not 
generally sufficient. 

We begin with proving a general result, namely 


oil RUS.) Gisto Ss ¢ Ss Sa for every R. 


By definition 0 ¢ R. We make use of (3.13) by putting s 5 and taking 
as q — ft 4 O(mod 5), t # 2, 4, 6(mod 8), t + 3, 6(mod 9). For u = 580 we 


6 With the possible exception of = 141 in the case (1 
Pies 





EXISTENCE AND CONSTRUCTION OF BIBD 


put accordingly the values of g and ¢ as follows: 


580 
691 
811 
961 
1111 
1291 
1471 
1681 
2011 
2401 


690 57 115 
810 i76 135 
960 “u— 160 
1110 92! 185 
1290 f 215 
1470 l 225 245 
1680 280 9750 — 8125 

< 2010 75 335 10804 — 9725 

s 2400 2000 400 u(mod 1800) 

¢ 2850 2375 175 10805 | 5 Sq S 1804 |} 


3390 — 2825 
4050 3375 
4830 4025 
5790 4825 
6870 — 5725 


8160 — 6800 


IA WA WA HA WA HA HA HA 
HA WA WA WA HA HA HA HA 


HA HA WA HA HA WA WA TA TA HA 


IV 


(7.8) K(5, {0, 1}) = {5, 6, 10, 11, 15, 16, 20, 35, 36, 40, 70, 71, 75, 76}. 

We shall prove, that for every u = 5 satisfying u = O or 1(mod 5), 
u € B(K(5, {0, 1}), 1) holds. For u e K(5, {0, 1}) the proposition is trivial and 
for u = 31 see (6.4.1). For uw = 21, 41 and 45 we have: 

(7.8.1)* 21 ¢ B(5, 1), (the projective plane PG[2, 4]). 

Elements: (7,7), (@¢@ = 0,1, --- ,6;7 = 0,1, 2). 

Blocks: {(¢ + 3°,7), (@+ 3,7), (@+ 34,7), 74+ 1), (2,7 + 2)}. 
(7.8.2)* 41 ¢ B(5, 1), (see [1}). 

Elements: (7), (¢ = 0,1, --- , 40). 

Blocks: {(¢ + 6%), (¢ + 6"), (¢ + 6 *!) (¢ + 6) (¢ + GPT): 


’ 


(7.8.3)* 45 « B(5, 1), (see [1] 


J 


Elements: (9,7), (g = a , = 0,1,2;4 = 0,1;j 


Blocks: g, 0), te; (g, 2), (g, 3), Cg, 4)3 


na (g - got? j as 1). (g + ors 

(9,79 + 3)}, 

For u 46, 50, 51 put in (3.16): 8s = 5, ¢ = u — 46, ¢ = 9; for u = 120, 

121 use (3.13) with s = 10,g = wu — 110,¢ = 11; for uw = 15] use (3.15) with 

s=6,t 25; for u = 271 use (3.13) with s = 10, q = 21, t = 25; and for 

u = 580 see (7.7). For other values of u, u= O or 1(mod 5) we make use of 
3.13) with s = 5, putting for g and ¢ the following values: 


A WA IIA HA IA WA 


WA WA WA HA HA HA HA 
A WA WA IA HA WA IA 


IIA 





378 HAIM HANANI 


(79) K(5, {0, 1,2, 3, 4}) 
[27/5 S tS BD), 22, 23, 24, 27, 28, 29, 32, 33, 34, 38, 39}. 
We shall prove that for every u = 5, ue B(K(5, {0, 1, 2, 3, 4}), 1) holds. 
For u ¢ K(5, {0, 1, 2, 3, 4}) the proposition is trivial and for u = 21 and 31 see 
(7.8.1) and (6.4.1) respectively. For u = 37, 44, 49 and 58 we have: 
(<Gl) 37 eBS, 9,1). 
Elements: (7,7), (7 0,1, -«- , 637 = © 1, 2,3) and. (A, hk), 
& = @ 1, ---,9). 
Blocks: Out of the elements (7,7), (¢ = 0,1, --- ,6;7 = 0, 1, 2, 3) form 
the design (6.3.2) and adjoin the element (A, h) to each of the 7 
disjoint quadruples of the Ath group, (h = 0, 1, --- , 8). Further 
form the block {(A, h):h = 0,1, --- , 8}. 
(7.9.2)* 49 e« B(7,1), (the Euclidean plane £G[2, 7]). 
Elements: (2,7), (¢ = 0,1, ---,6;7 = 0,1, ---, 6). 
proces:  ((0, 7), (1, 3); (2,3), 3.2); (69), (8 72s CDi, 
'¢e, 0), Ce, 1), G, 2), G, 3), (2, 4), G, 5), (4, 6D}, 
(i, 0), (¢ + 3°, 1), (6 + 3°", 2), (¢ + 3°, 3), (6 + 3°, 4) 
(¢+3°% 5), (6+ 3°°,6)}, 6 =0,1,-:: 


’ 


By 
(7.9.3) 44 e B({5, 6, 7}, 1. 
Delete from the design (7.9.2) any 5 elements no 3 of which are collinear, e.g. 
the elements: (0,0), (0, 1), (1, 0), (1, 1), (2, 2). 
(7.9.4)* 64 « B(8, 1), (the Euclidean plane EG[2, 8}). 
Elements: (g,7),(g = a + ax + aor’; a; = 0,1;7 = 0,1, 2; 
3=0,1,--- ,i;2 =aet+. 
Blocks: {(0,7), (1,7), (2,7), (2,7), 1 + 2,7), (x + 2’,j) 
(l+a2+2',j),(1+2',J)}, 
1(g, 0), (g, 1), (¢, 2), (g,.3), (g, 4), (g, 5), (a, 6), Cg, i, 
{(g,0), (g + 2°, 1), (g + 2°", 2), (g + 2°, 3), (9g + 2, 4), 
(g+a°* 5), (g+ 2°", 6), (9+ 2°°5,7)}, 6=0,1,---,6. 
(795) S83 eB({5, 6, 7, 8}, 1). 

Delete from the design (7.9.4) any 6 elements no 4 of which are collinear, 
e.g. the elements: (0, 0), (0, 1), (0, 2), (1,0), (1, 1), (1, 2). 

For u = 580 see (7.7) and for all other values of w we make use of (3.13) 
taking for q, s and ¢ the values as shown at top of next page. 

We are now able to prove the theorem stated in (7.6): 

(7.10) Jfv= 1 or 5(mod 20) and v ¥ 141, then v € B(5, 1). 

For v = 5 the proposition is trivial. For v 2 21 we may write v = 4u + | 
with u= 0 or 1(mod 5) and u = 5. Putting in (7.4): k = 5, R = {0, 1}, A = 1 
and considering (3.6) it remains to be shown that 4u + 1eB(5, 1) 
for u e K(5, {0, 1}), (see (7.8)). For wu = 5, 10 and 11 this is proved in (7.8.1), 
(7.8.2) and (7.8.3) respectively and for other values of u we prove: 

({72O01)* 25.€ B3(5, 1), (the Euclidean plane £G[2, 5}). 
Elements: (3,7), (¢ = 0, 1, 2,3,4;7 = 0, 1, 2, 3, 4). 


9 9 9 ©») 





EXISTENCE AND CONSTRUCTION OF BIBD 


Qo 
1 or 


or or 


oO on 


> or or or on 


~Annn oo 


— 
- ¢ 


ronan an 


© OO 


or 
- ¢ 


aT 


> 


_ 


~l 
or 


_ 
w © 


HWA WA WA HA WA HA WA HA WA HA TA TA HA HA TA TA 
WA HA WA HA WA WA WA A HA HA HA WA HA TA TA IA 
or or or or or or 


“10 or 
or or or or 


Blocks: {(0,7), (1,7), (2,7), (3,7), (4,7 
(these blocks show that 25 e 
t, ©), (s, 2), 6, 2), 


1,0), (¢+ 2° 1) 


(7.10.2)* 61 ¢ B(5, 1), (see 
Elements: (2), ( es 0, 


))- 
, 60). 


(1 
: 
Blocks: {(i + 2”), (¢+2 apeidy (4 + DPM) Cg + DPT) (Cg 4 QT) 
B = 0, 1, 2. 


(7.10.3)* 65 € B(5, 1), (see [1]). 
Elements: (7,7), (¢ = 0,1, ---,12 i 0, iL, 2, 3, 4). 
Blocks: {(2, 0), (2,1), (2), 3), (2, 4) 
{¢+2° 7), (+2 0), (i 4 Ms a +1), (¢+ 2°" 741), 
(i,j +3)}, 6=0,1,2 
(7.10.4) 81 ¢ B(5, 1) 
Elements: (g), (g = > eno O's a = 0, 1,253 = 0 1, 2,3); 
a = 27° + 27 +e+1. 
Blocks: {(g + 2°*7), (g + 2?t7*"*), (g + x¥t7*®), (9 bf oo. 
(g + gett) B= 
(7.10.5) 141 e¢ B(5, 1)? 

So far no proof is abi On the other hand we remark that in the proof of 
ue B(K(5, {0, 1}), 1) for wu > 35 (in Section (7.8)), we made no use 
of 35 e K(5, {0, 1}) and therefore the omission of proof of (7.10.5) does not 
impair the validity of proposition (7.10) for other values of v. 

(7.10.6) 145 e B(5, 1). 
Elements: (2,7), (¢ = 0,1, --- , 28;7 = 0, 1, 2,3, 4). 





HAIM HANANI 


(10.7 163 
Elements 


>] ] 
OCKS 


1 


7.10.8 28] 
Elements 
Bloc ks 


7.10.9 285 ¢ 


Elements: | ne’ ee , 2, 3, 4) and A, k), 
(h O, 3, 2, a 4). 
3, 4) take the 61 elements 
oe. is 0, 1, --- , 55) and (A, h), (h 0. 1, 2. d. 4) and 
form a design B[5, 1, 61] as in (7.10.2) such that 
{(A,Q0), (A, 1), (A, 2), (A, 3), (A, 4)} 
is one of the blocks. The union of the systems B[5, 1, 61] for 
7 0, 1, 2, 3, 4 and of the system 7'56{5, 56] with 


( ~ = 


, 7) ¥(@, 24 0, 1,--- , 55), 3 


Blocks For every j, (J 


gives the required design. 
7.10.10 a01 e Bld, I 
Klements t,7), (4 1,2, --- , 6037 0, 1,2, 3,4) and (A 


Blocks: Consider the system B[5, 1, 61] constructed in (7.10.2). For every 


quintuple {(0), (b1), (be), (ds 


(b,)} of this system containing the 
element (0) take the set of 21 elements (h;, 7), (be, 7), (bs, 7 


bs. 7), ( 2. 3 


ie Pe 3 0, 1, 2, 3, 4) and (A) and form out of them the 
system B[5, 1, 21] as in (7.8.1 


’ 


kor every quintuple ) (do), (Ay), (de), (a3), (a4)} of Bi5, , Of) which 


4/5 


does not contain the element (0) form the blocks 


a,j), (1,7 + a), (Q2,7 + 2a), (43,7 + 3a), (%&,7 + 4a)}, 


J 0,1, 2, 3, 4; @ 0, 1, 2, 3, 4). All the blocks so constructed 
together with the a. m. systems B[5, 1, 21] form the required 
design. 
7.10.11 305 ¢ B(5, 1 
Klements 


Blocks 


mod 5 , then 





(7.3) with k 


forv ¢ K(5, {0, 1} 

from (6.8.1 

t.4283 
Elements: 
Blocks: 

4112 
Elements: 
Blocks: 


Elements: 
Blocks: 


7.11.4 16 ¢ 
Elements: 


Blocks 


4112 20 ¢ B 
Elements: 
Blocks: 


7.11.6 oo 
Elements: 


Blocks: 


fe 36 ¢ B 


Elements: 


Blocks: 


b é (2, 


EXISTENCE 


5, Rk 


. For other values of 2 


i. EFA t we 
, (see (7.8)). For 2 


AND CONSTRUCTION 


we prove: 


OF 


BIBD 


5 this is trivial and for 1 


381 


have to prove thatv ¢ B(5, 4 


11 it follows 





382 HAIM HANANI 





(7.11.8 10 ¢ B(5, 4). 
Elements: (9, 7),(g do + ax + ax’; a, = 0,1;7 = 0,1, 2; 
7=0,1,2,3,4);7 =2+1. 
(g, 0), (9, 1), (g, 2), (¢g, 3), (g, 4)}, 4 times, 
(gt2°,j),(g+2°",97), gt 2°%,54+1), 94+ 2"%,54+1) 
(g3+3)}], @=90,1,---,6. 





Blocks: 


j 
t 
‘ 
' 


(7.11.9) 70 ¢ B(5, 4). 
Elements: (?,7,h), (¢ = 0,1, 2,3,4;7 = 0,1;hk = 0,1, --- ,6). 
Blocks: For every h, (h = 0, 1, --- , 6) form the blocks 
{(ao,h), (a1, h), (a2,h), (a3, h), (a4, h)}, 
where {(ao), (a:), (a2), (a3), (as)} are blocks of the design 
B[5, 4, 10] formed out of the elements (7,7), (¢ = 0, 1, 2, 3, 4; 
j 0, 1), (see (7.11.2) ). Further form the blocks: 
((i,j,h), G+ 277 7 +6,h +3”), (6+ 27 "7 +6,h + 3°) 
C425 44,8499"), 064+ FT, 5 +4 4+1847") 
6 = 01, 2-4 = @. i-3 = ©, t. 
7.11.10) 71 ¢€ B(5, 2), (compare (4.4)). 
(2 O. 1. -+-, 4). 
Blocks: {(¢ + 7°), (¢ + 7°*™), (¢ + 7°™ 


Klements (2 
. = 8 +42 * —=B+56 
mAs | a ), (@ + é yt, 

B = 0, i. sia 6. 


’ 


Put in (3.11): m », U 15, K’ K LB}, A’ 1,” 1 and apply to 
‘.1] 1 (7.10.1 
2 it B } 

Klement 0, | \4 0, 1, 2, 3, 4) and 1 

ie App the design 7 1L.11L) tothe elements 1 - S. , 14; 

() a ob. 4 Phe ck My I y be arranged in s ich a wav that 

{ oO 1 14 

© , . 











(7.123) 
Elements: 


Blocks: 


(7.12.4) 12 


Elements: 


Blocks: 


(7.12.5) 


13 « B(5, 5), 


EXISTENCE AND CONSTRUCTION OF BIBD 


9 ¢ B(5, 5), (compare (4.5) 


(g), (g = ao + qr; a; = 0,1,2;7 = 0,1); 2° = 2x +1. 
‘(g), (g + zr’), (g + ay". (g+ g**), (g + a yy, B=, I 
+ B(5, 2). 

(g,7), (g a% + az; a; = 0,1;2 = 0,1;7 = 0, I, 2); 

e=aret+l. 


(g+2°,j),(9+2",3),(g+2°",5), (9 + 2,5 +1), 
(g+2°74+2)}, @=0,1 2, twice, 
fgt+r,j),gt+2",7), gto gj+1,g+2", 74+), 
(g+ta",j+2)}, 6 =0,1,2, 
(g+2,j),(9+ 2,3), (9 +29), (5+ 1), (9,5 + 2), 
twice. 


(compare (4.5) ). 


Elements: (72), (¢ = 0,1, --:, 12). 
Blocks: {(7), (¢ + 2°), (¢ + 2°**), (¢ + 2°**), (¢ + 2°%)), 86 =0,1,2. 
(7.12.6) 14 e B(5, 20). 
Elements: (7,7), (¢ = 0,1, ---,6;7 = 0,1). 
Blocks: {(i,j), (i+ 3°,j),@+3°",j), @+3°"%,j54+1), 
(7+ 7 +1)}, 6 =0,1,2, twice, 
1(4, 7), (@+ 3.3 (a+ 3°r° 5), (4 + 3°,j +1),(¢+ 4 +1)}, 
B = 0, 1, 2, 
{(i,j), (¢ + 3,9), (6+ 37, 7), (6 + 373), (5 + DD}, 
y = 0,1, twice. 
(¢.424 17 ¢ B(S5, 5), (compare (4.5 
Elements a. i 0. 1, , 16). 
Blocks + 3° +3) (+3 (§ + 3°**)). 
B= 0,1,2,3. 
4.12.8 ISe B 20 
Flemet a 0.1.2 0, | 0. 1): 
2x + 1 
Is 7 | 
0. | Lwice 
() 
a 
55 1S¢ 155 31 130 < 510 $25 S5 
190 1 OF 180 56 511 Ss u 576 l 505 101 





378 HAIM HANANI 


(7.9) K(5, {0, 1, 2, 3, 4}) 
iz(5 S&S zt S&S BW), 22, 23, 24, 27, 28, 29, 32, 33, 34, 38, 39}. 

We shall prove that for every u 2 5, ue B(K(5, {0, 1, 2, 3, 4}), 1) holds. 
For u e K(5, {0, 1, 2, 3, 4}) the proposition is trivial and for u = 21 and 31 see 
(7.8.1) and (6.4.1) respectively. For u = 37, 44, 49 and 58 we have: 

(7.9.1) 37 e B({5, 9}, 1). 
Elements: (1,7), (¢ = 0,1, --- ,6;7 = 0,1, 2,3) and (A, Ah), 
(h = 0,1, ---,8). 
Blocks: Out of the elements (7, 7), (¢ = 0,1, --- ,6;7 = 0, 1, 2,3) form 
the design (6.3.2) and adjoin the element (A, h) to each of the 7 
disjoint quadruples of the hth group, (h = 0, 1, --- , 8). Further 
form the block {(A, h):h = 0,1, --- , 8}. 
(7.9.2)* 49 « B(7, 1), (the Euclidean plane EG[2, 7]). 
Elements: (7,7), (¢ = 0,1, ---,6;7 = 0,1, --- , 6). 
Blocks: {(0, 7), (1,7), (2,7), (3,7), (4,7), (5,9), (6, 7)}, 
{(z, 0), (2, 1), (4, 2), (7%, 3), (7, 4), (7, 5), (4, 6)}, 
{(z, 0), (¢ + 3°, 1), (¢ + 3°" 2), (¢ + 3°, 3), (¢ + 3°** 4), 
(i+ 3°", 5), (¢ + 3°, 6)}, 6 =0,1,---,5. 
(7.9.3) 44 e B({5, 6, 7}, 1. 
Delete from the design (7.9.2) any 5 elements no 3 of which are collinear, e.g. 
the elements: (0, 0), (0, 1), (1, 0), (1, 1), (2, 2). 
(7.9.4)* 64 « B(8, 1), (the Euclidean plane EG[2, 8)}). 
Elements: (g,j),(g = ao + ax + az’; a; = 0,1; 7 = 0, 1,2; 
j=0,1,---,7);2 =2+1. 
Blocks: {(0, 7), (1,9), (2,3), (2,9), 1 + 2,3), (2 + 2’, J) 
(l+24+2',j),(1+2',j)}, 
{(g, 9), (g, 1), (g, 2), (9, 3), (g, 4), (9, 5), (9, 6), (9, 7}, 
{(9, 0), (9 + 2°, 1), (g + 2°", 2), (g + 2°", 3), (g + 2", 4), 
(g + nr, 5), (g + oy 6), (g + er 7}, B= 0,1,---,6. 
(7.9.5) 58 « B({5, 6, 7, 8}, 1). 

Delete from the design (7.9.4) any 6 elements no 4 of which are collinear, 
e.g. the elements: (0, 0), (0, 1), (0, 2), (1,0), (1, 1), (1, 2). 

For u = 580 see (7.7) and for all other values of wu we make use of (3.13) 
taking for gq, s and ¢ the values as shown at top of next page. 

We are now able to prove the theorem stated in (7.6): 

(7.10) JIfv= 1 or 5(mod 20) andv =~ 141, then v e B(5, 1). 

For v = 5 the proposition is trivial. For v 2 21 we may write v = 4u + 1 
with u= 0 or 1(mod 5) and u = 5. Putting in (7.4): k = 5, R = {0,1},A = 1 
and considering (3.6) it remains to be shown that 4u + 1eB(5, 1) 
for u e K(5, {0, 1}), (see (7.8)). For u = 5, 10 and 11 this is proved in (7.8.1), 
(7.8.2) and (7.8.3) respectively and for other values of u we prove: 

(7.10.1)* 25e Bs(5, 1), (the Euclidean plane EG{[2, 5}). 
Elements: (7,7), (¢ = 0,1, 2, 3,4;7 = 0, 1, 2, 3, 4). 





EXISTENCE AND CONSTRUCTION OF BIBD 


lA 


WA WA 


IA WA WA 
e 


55 


lA 


IA WAWAWA At AMAA AA 
eekeerwe are 
IMANOSAAONAAaAaaaan 
WA WA HA HA WA WA WA WA WA TA WA WA TA HA TA A 
See eeveecezaeeweerweeces se 
HA IIA HA WA WA WA WA TA WA TA TA WA TA TA A A 
eegeeeeret eee eeee € 


63 
65 
72 
77 


> 
i 











| HWA WA WA IA 





1(0, 7), (1,9), (2,9), (3,9), (4, 9)}, 
(these blocks show that 25 ¢ Bs). 
{(t, 0), (¢, 1), (4, 2), (2, 3), (7%, 4)}, 
{(, 0), (¢ + 2°,1), (i + 2°", 2), (6 + 2°", 3), (6 + 2°, 4}, 
B = 0, 1, 2, 3. 
(7.10.2)* 61 e« B(5, 1), (see [1]). 
Elements: (7), (¢ = 0,1, --- , 60). 
Blocks: {(i + 2”), (¢ + 21"), (¢ + 2°), (6 + Qt) (¢ 4+ QH*)) 
B = 0, 1, 2. 
(7.10.3)* 65 e B(5, 1), (see [1]). 
Elements: (7,7), (¢ = 0,1, ---,12;7 = 0,1, 2, 3, 4). 
Blocks: {(7, 0), (7,1), (4, 2), (¢, 3), (4, 4)}, 
(t+ 27,9), (6+ 2% 9), (GOP 541), +2" 754+ 0), 
(1,7 + 3)}, 6 = 0,1, 2. 
(7.10.4) 81 ¢B(5, 1). 
Elements: (9g), (g = Dino a2’; a; = 0, 1, 2;4 = 0, 1, 2, 3); 
f= 27° + 27? +241. 
Blocks: {(g + get) (g + ght rts, (g + gitar) (9 + tht rts) 
(g+2%*7)), B=0,1;7 = 0,1. 
(7.10.5) 141 e B(5,1)? 

So far no proof is available. On the other hand we remark that in the proof of 
ue B(K(5, {0, 1}), 1) for wu > 35 (in Section (7.8)), we made no use 
of 35 e K(5, {0, 1}) and therefore the omission of proof of (7.10.5) does not 
impair the validity of proposition (7.10) for other values of v. 

(7.10.6) 145 e B(5, 1). 
Elements: (7,7), (¢ = 0,1, --- , 28;7 = 0, 1, 2, 3, 4). 





380 HAIM HANANI 


Blocks: {(2, 0), (7, 1), (¢, 2), (4, 3), (¢, 4)}, 
(6 +2", 5), (6 +27", 5), (6+ 27 741), 6+ 2°" 54-1), 
(4,7 +3)}, 6 =0,1,---,6. 
(7.10.7) 161 e B(5, 1) 
Elements: (1, 7), @ = = _O, 2;j » 6). 
Blocks: {(7,7), 


Eanes G45 ial 2 5+3'7)}, 
8 = 0,1;7 = 0,1, 2, 
+ 5", i), (i+ 5"), 
i,j + 3°), (¢,7 + 3°}. 


(7.10.8) 281 e B(5, 1). 
Elements: (7), (¢ = 0,1, --- , 280). 
Blocks: {(¢ + 3”). (¢ + geetss (i+ get a (i ss greties) 
(i+ 34%) g = 0,1, ---, 13. 


(7.10.9) 285 e B(5, 1). 
Elements: (7,7), (¢ = 0,1, ,05;j7 = 0, 1, 2, 3, 4) and A, h), 
(h = 0, 1, 2 2, 3, 4). 
Blocks: For every j, (j = 0, 1, 2, 3, 4) take the 61 elements 
(i,j), (¢ = 0, 1, --- , 55) and (A, h), (h = O, 1, 2, 3, 4) and 
form a design B[5, 1, 61] as in (7.10.2) such that 
{(A, 0), (A, 1), (A, 2), (A, 3), (A, 4)} 
is one of the blocks. The union of the systems B[5, 1, 61] for 
= 0, 1, 2, 3, 4 and of the system 7's6[5, 56] with 
vr; =j) {(#, :¢ = 0, 1,--- , 55}, 7 = 0, 1, 2, 3, 4, 
gives the required design. 
(7.10.10) 301 ¢ B(5, 1). 
Elements: (2,7), (¢ = 1,2, --- ,60;7 = 0, 1, 2,3, 4) and (A). 
Blocks: Consider the system B[5, 1, 61] constructed in (7.10.2). For every 
quintuple {(0), (b:), (be), (bs), (b4)} of this system containing the 
element (0) take the set of 21 elements (b;, 7), (be, 7), (bs, 7), 
(bs, 7), (Gg = 0, 1, 2, 3, 4) and (A) and form out of them the 
system B[5, 1, 21] as in (7.8.1). 
For every quintuple { (ao), (a:), (a2), (as), (a4)} of B[5, 1, 61] which 
does not contain the element (0) form the blocks 
{(ao, J), (a1 J + a), (@2,J - 2a), (ds,j + 3a), (%&,j + 4a)}, 
(j = 0, 1, 2, 3, 4; a = 0, 1, 2, 3, 4). All the blocks so constructed 
together ao the a. m. systems B[5, 1, 21] form the required 
design. 
(7.10.11) 305 e B(5, 1). 
Elements: (1,7), (¢ = 0,1, --- ,60;7 = 0,1, 2,3, 4). 
Blocks: {(7, 0), (¢, 1), (4, 2), (%, 3), (4, 4)}, 
(i+ 2°, 7), (G+ 2°™, 7), (G+ PH G41), (+2, 5 4+ 1), 
(,7+3)}, 8 =0,1,---, 14. 
(7.11) JIfv= 0 or 1(mod 5), then v e€ B(5, 4). 





EXISTENCE AND CONSTRUCTION OF BIBD 381 


By (7.3) with k = 5, R = {0, 1}, \ = 4 we have to prove that v ¢ B(5, 4) 
forv e K(5, {0, 1}), (see (7.8)). For v = 5 this is trivial and for v = 11 it follows 
from (6.8.1). For other values of v we prove: 

(7.11.1) 6 B(5, 4). 
Elements: (7,7), (¢ = 0,1,2;7 = 0,1). 
Blocks: {(7,7), (¢ + 2°,j), (¢+ 2,7), (¢@+2).94+1), (4+ 2',74+1)}. 
(7.11.2) 10 ¢ B(5, 4). 
Elements: (2,7), (¢ = 0,1, 2;7 = 0, 1, 2) and (A). 
Blocks: {(A), (4,7), (@+ 1,7), @4,j7+1), (¢+1,74+ 2)}, 
((0,7), (1,9), (2,9), (4,9 +1), (@ + 1,7 + 2)}. 
(7.11.3) 15 e B(5, 4), (for nonexistence of B[5, 2, 15] see [14, 4, 7]). 
Elements: (2,7), (¢ = 0, 1, 2,3, 4;7 = 0, 1, 2). 
Blocks: {(t,7), (¢+ 2,7), (@+ 3,7), @,74+1), (¢+4,74+2)}, 
{(4,j), (@¢+ 1,9), (4,9 +1), (¢€+2,7 +1), (4,94 2)}, 
{(4,0), (¢ + 2,1), (¢ + 3,1), («+ 4,1), (¢ + 1, 2)}, 
{(4,0), (¢+ 1,0), (¢+ 2,1), (¢+ 3,2), (¢4+ 4,2)}, 
{(0, w), (1, a), (2, a), (3, a), (4, @)}, a = 0,% 
(7.11.4) 16 ¢ B(5, 4), (compare (4.3)). 
Elements: (9), (g = >. j-0 42°; a; = 
Blocks: {(g + 2°), (g+ 2°"), (g+a 


-¢=0,1,2,3);2° =2+1. 
), (g + g*). (g + a? t)) | 
6 = 0, 1,2. 


i 
+6 


(7.11.5) 20 ¢ B(5, 4). 

Elements: (7,7), (¢ = 0, 1, 2, 3,4;7 = 0, 1, 2, 3). 

Blocks: {(7,7), (¢+ 4,7), (4,7 +1), (@+2,j7 +1), (4,7 + 2)}, 
{(4,7), (@+1,7), @79+1), (¢@+3,74+1), (@+1,7 + 3)}, 
{(4,7), (@ + 4,79), (¢ + 1,9 +1), (4,9 + 2), (¢ + 2,7 + 2)}, 
{(4,a@), (t+ 1l,@), (@¢+2,a+1), (¢+4,a+1), (¢ + 3,3) 

a=0,1,2; for a=2, take a+1= 
((0, 3), (1, 3), (2, 3), (3, 3), (4, 3)}. 
(7.11.6) 35 € B(5, 2). 

Elements: (2,7), (¢ = 0,1, --: ,6;7 = 0, 1, 2, 3, 4). 

Blocks: {(7, 0), (2, 1), (4, 2), (4, 3), (4, 4)}, twice, 
((i+3°,9), G+ 3°", 7), €+ 3°", 5 +1), (6+ 3°", 75 +1) 

(,j7+3)}, 8 =0,1,2. 


! 
S> 


(7.11.7) 36 ¢ B(5, 4). 
Elements: (g, 7), (g = a + mz; aq = 0,1, 2;i=0,1;7 = 0,1,2,3); 
g = 2¢+1. 

Blocks: {(g + 2”,j),(g+ gP*? 5), (g+ oe 5 +1), (9,7 + 2), 
(g+2°** 7+ 3)}, 8 =0,1,2,3, 
gta" 9), (9g +2°",7), (G5 +1), (9,5 + 2), (9,5 + 3)}, 
y = 0,1, 

(9,7), (9g +23), (9+ 2,9), (9 + 2,3), (9 + 2°,J)}. 





382 HAIM HANANI 


(7.11.8) 40 ¢ B(5, 4). 
Elements: (g,7),(g = ao + ax + ax”; a, = 0,1;7 = 0,1, 2; 
j = 0,1,2,3,4);2 =2+1. 
Blocks: {(g, 0), (g, 1), (g, 2), (g, 3), (g, 4)}, 4 times, 
(9g +25), (9 +25), 9+ 27,541), (9+ 2"%,5 +1), 


(9,7 + 3)}, 6 =0,1,---,6. 
(7.11.9) 70 ¢ B(5, 4). 


Elements: (7,j,h), (¢ = 0,1, 2,3,4;7 = 0,1;h = 0,1,--- , 6). 
Blocks: For every h, (h = 0, 1, --- , 6) form the blocks 
{(@o,h), (a1,h), (a2,h), (as, h), (a4, h)}, 
where {(ao), (a1), (a2), (as), (a4)} are blocks of the design 
B[5, 4, 10] formed out of the elements (7,7), (¢ = 0, 1, 2, 3, 4; 
j = 0,1), (see (7.11.2)). Further form the blocks: 
((i,5,h), (G+ 27 5 + Oh +3”), (i+ BVM GF +4,h + 3°") 
(EFM Gtyh+ 3"), (G+ Mityr+1h+3™)} 
8 = 0,1,2;7 = 0,1;6 = 0, 1. 


(7.11.10) 71 ¢ B(5, 2), (compare (4.4)). 
Elements: (72), (¢ = 0,1, --- , 70). 
Blocks: {(¢+ 7), (6+ 7™), (G+ 7™), (G+ 7™), G+ P™}, 


B=0,1,---,6. 
(7.11.11) 75 € Bs(5, 4). 
Put in (3.11): m = 5, u = 15, K’ = K = {5}, ’ = 4, ” = 1 and apply to 
(7.11.3) and (7.10.1). 
(7.11.12) 76 ¢« B(5, 4). 
Elements: (7,7), (¢ = 0,1, ---,14;7 = 0,1, 2, 3, 4) and (A). 
Blocks: Apply the design (7.11.11) to the elements (7,7), (¢ = 0,1, ---, 14; 
j = 0,1, 2, 3, 4). The design may be arranged in such a way that 
among the blocks should appear the quintuples 
{(t, 0), (7, 1), (¢, 2), (7, 3), (¢, 4)}, +=0,1,---, 14, 
four times each. Leave all other blocks of (7.11.11) without 
change and instead of the block {(7,0), (¢, 1), (7, 2), (#, 3), (4, 4)} 
taken 4 times take the design (7.11.1) on the elements: 
(A), (4, 0), (4, 1), (4, 2), (2, 3), (4, 4), ¢=0,1,---, 14. 
(7.12) For every v, v ¢ B(5, 20) holds. 

By (7.3) with k = 5, R = {0, 1, 2, 3, 44, \ = 20 we have to prove 
that v e B(5,20) forv e K(5, {0,1,2,3,4}), (see (7.9)). Forv = 5 this is trivial 
and forv = 6,10, 11, 15, 16 and 20 this follows from (7.11.1), (7.11.2), (6.8.1), 
(7.11.3), (7.11.4) and (7.11.5) respectively. For other values of v we have: 
(7.12.1) 7 ¢ B(5, 10), (compare (4.5)). 

Elements: (2), (¢ = 0,1, --- , 6). 

Blocks: {(i), (¢ + 3°), (¢ + 3°"), (¢ + 3°), (¢ + 3°™)}, @=0,1,2. 
(7.12.2) 8 e B(5, 20), (compare (4.2)). 

Elements: (g), (g = ao + aw + a2”; a; = 0,1;7 = 0,1,2);2° =2+1. 

Blocks: {(g + 2°), (9 + 2°”), (9 + 2°”), (9 + 2”), (9 + 2}, 
B= 0,1, ---, 6. 





EXISTENCE AND CONSTRUCTION OF BIBD 


(7.12.3) 9 e B(5, 5), (compare (4.5)). 
Elements: (g), (g = a + az; a; = 0,1,2;4 = 0,1);2° = 22 +1. 
Blocks: {(g), (g + 2°), (g + 2°”), (g + 2°"), (g + 2°**)}, 8 = 0,1. 
(7.12.4) 12 ¢ B(5, 20). 
Elements: (9,7), (9g = a + m2; a; = 0,1;% = 0,1;7 = 0, 1, 2); 
rv=2 eh 
Blocks: {(g + 2°,j), (9+ 2°", 3), (9 + 2°", 9), (9 + 2,5 + 1), 
(g+2°",7+2)}, 6B =0,1 2, twice, 
(gt+ 2,7), (g+2°", 9), (9+ 2°", 54+1), (g + 2°", 7 +1), 
(g+2°" 7+ 2)}, 6=0,1,2, 
fg+2,9),(9+ 2,3), (9+ 27,3), (I+ 1), (G5 + 2)}, 
twice. 
(7.12.5) 13 ¢ B(5, 5), (compare (4.5)). 
Elements: (7), (¢ = 0,1, ---, 12). 
Blocks: {(i), (i + 2°), (¢ + 2°**), (¢ + 2°*%), (¢ + 2°")}, 6 =0,1,2. 
(7.12.6) 14 ¢ B(5, 20). 
Elements: (7,7), (¢ = 0,1, ---,6;7 = 0,1). 
Blocks: {(¢,j), (¢ + 3°,j), @+3°",7), + 3°" 7 +1), 
(¢+ 3° 7+1)}, 6 =0,1,2, twice, 
{(a, 9), (6+ 3", 7), (@+3°% 9), (6 4+3°, 741), (+399, 74+1)}, 
6 = 0,1, 2, 
{(i,j), (¢ + 3,7), @ + 3°", 9), (6 + 37%, 5), (4,9 + 1D}, 
+ = 0,1, twice. 
(7.12.7) 17 « B(5, 5), (compare (4.5)). 
Elements: (7), (¢ = 0,1, ---, 16). 
Blocks: {(i), (¢ + 3°), (¢ + 3°), (¢ + 3°), (¢ + 3°**)}, 
6 = 0,1, 2, 3. 
(7.12.8) 18 ¢ B(5, 20). 
Elements: (9,7), (g = a + ax; a; = 0,1,2;7 = 0,1;7 = 0,1); 
ag = 22 +1. 
Blocks: {(9,3), (g+2°,3),(g+2°",9), 9 + 2°",5 + 1), 
(g+2°*7+1)}, 6 =0,1,2,3, twice, 
(9,7), (9+ 2,3), (9g + 2", 35), (9 + 2,541), 
(gg+2"* 7+}, 8 =0,1,2,8, 
{(9, 9), (g + 27,3), (g +2", 9), (g + 2°", j + 8), 
(g+a7**™, 7+ 5)}, y=0,1; 6=0,1, 
{93),(9+2°,7), 9 +255), G+2,54+1), (9 + 2,54 1)}. 
(7.12.9) 19 e B(5, 10), (compare (4.5) ). 
Elements: (7), (¢ = 0,1, ---, 18). 
Blocks: {(i), (i + 2°), (¢ + 2°77), (¢ + 2°), (¢ + 2°*™)}, 
8 = 0,1,--- 
(7.12.10) 22 ¢ B(5, 20). 
Elements: (7,7), (¢ = 0,1,---,10;7 = 0,1). 





384 HAIM HANANI 


Blocks: {(7,7), (¢ + 5) at 2m" 7), a+r 7+ 1), 
(¢+2°*° 54+1)}, 6 =0,1,2,3,4, twice, 
{(4,j), (@+ 27,7), @ +27" 9), (6 +277 41), (6+ 29°54 1)}, 
B = 0, 1, 2, 3, 4, 
{(¢ + 2°75), (6+ 2°", 5), (6 +27, 7), (6 + 29°55), (474+ 1}, 
B = 0,1,2, 3,4, 
{(¢+2°, 7), ¢+27,9), (@+ 25,7), 6 + 25,9), 6+ 25,7)}. 
(7.12.11) 23 « B(5, 10), (compare (4.5)). 
Elements: (7), (¢ = 0,1, --- , 22). 
Blocks: {(i), (¢+ 5°), (¢+ 5°"), (¢ + 5° 


(7.12.12) 24 e B(5, 20). 
Elements: (9,7), (g = ao + ax + aoa"; a; = 0,137 
2=2 + 1. 
Blocks: {(g + 2,3), (g+ 2°** 35), (g+ vj + 1), ( 
(g,j + 2)}, 
{(9,3), (9 +2,3), (9 +2°",3), (9.5 +1), Cf 


{(g + 2,5), (gt 27,35), 9 +2”, 5), 
(g+27° 7+ 1)}, 
i(g + 2’, a, (g + a’ 3), (g + ot. (g 
(g+ 2°", 5+ UI, 
i(g + #3), (g + 2’, 3), (g + xj); (g,j +1),(g+ x, j 
(7.12.13) 27 e« B(5, 10), (compare (4.5) ). 

Elements: (g), (g = ao + az + a2; a; = 0,1,2;i = 0,1,2);2° = 2 
Blocks: {(g), (g + 2°), (g+2°"), (g+ 2°” +14 


3 8 
‘),(g +2 )}, 


B= 


es pee 
(7.12.14) 28 e B(5, 20). 
Elements: (1,7), (¢ = 0,1, ---,6; gj = 0,1, 2, 3). 
Blocks: {(i + 3°, 7), (¢ + 3°", 7), (¢ + 3°, 7 +1), (4,7 + 2), 
(i+ 3°" 74+ 3)}, 6=0,1,2, taken 3 times, 
(i+3° 7), @+3°* 5), GQg7+), 5+ 2), 7 + 3d}, 
B=0,1,2, twice, 
J), (4,9 + 1), (4,7 + 2), 
y = 0,1, taken 3 times, 
J), i+ 3,j), G74+1), GI+ 3)} 
3 times, 


>~+2 


ros 2 2 ° Qo yT2 = Ps oe 
{((¢ + 3’,7), (¢ + 3°",9), 4 +3 


y+4 


ay 


{(¢ + 3° j), @+3 
(4,7), @+3°97),44+377, 64+ 39), 45+ DI, 
(7.12.15) 29 ¢ B(5, 5), (compare (4.5) ). 
Elements: (7), (¢ = 0,1, --- , 28). 


- - » 08 . o8+7 . 98714 
Blocks: (7), (¢ + 2°), (¢ + 2 , («+2 ), 


(7.12.16) 32 ¢ B(5, 20), (compare (4.2 
Elements: ‘g),(g = > i0a,x';a; = 0,1;4 = 0,1, 2, 





EXISTENCE AND CONSTRUCTION OF BIBD 
Blocks: {(g + 2°), (g + 2°"), (g + 2°"), (g + 2°), ( 


(7.12.17) 33 e B(5, 5). 
Elements: (7,7), (@ = 0,1,---,10; j = 0,1, 2). 
Blocks: {(i + 2°,j), (¢ +2°°, 7), (¢ + 27,7 +1), (6 + 2°*4,7 +1), 
(4,7 +2)}, 6B = 0,1, 2, 3, 4, 
{i+ 2,7), 6 +29), @ +29), 6 + 2°95), G+ 27, A}, 
(4,7), @+2°,7), G7 +1), 6+2,74+1), (+ 2,54+ 2), 
{(i, 3), (¢ + 2', 9), (6 + 2,9), (4,7 +1), (6,9 + 2)}. 
(7.12.18) 34 e B(5, 20). 
Elements: (7, j), (¢ = 0,1, ---,16; j = 0,1). 
Blocks: {(i,7), (¢ + 3°,j), (¢+ 3°", 7), ¢ + 3°", 741), 
(+3? 7+1)} 6 =0,1,---,7, 
{(¢, 9), @+ 3°97), @+3°°, 9), GI+D, G+ 39% 74+ 1}, 
6 =0,1,---,7, 
{(¢ + 37,7), (6 +37, 5), (6+ 3°", 9), (64+ 3°"™, 5), 
(4,7 + 1)}, vy = 0,1, 2,3, twice, 
{(i, 7), @+ 3,7), @+3'",j), G+ 39°" 741), 
G+3°"% 7+1)}, » =1,2,3,5,6, 
(i,j), @+ 3,97), @+3°%,7), G+ 39°", 54+ 1), 7+ VD}, 
y 0, 4, 
{(i,9), (¢+3°,7), (6+ 3°97), (6+ 3%,9), (G+ 3", /)}, 
{(, 7), (+ 3,9), (6+ 34,9), (6+ 3,9), (6 + 3", 9}. 
(7.12.19) 38 € B(5, 20). 
Elements: (1,7), (¢ = 0,1, ---,18;7 = 0,1). 
Blocks: {(i,7), (¢ + 2°,j), (¢+2°",7), (¢ +27". +1), 
(6+ 2°? 54+ 1)}, B an 2. cee has 
{(t,j), (¢+ 27,7), (@+27,7), @+2°"%, 5+ 1), 
(¢+277 7+1)}, 7 =0,1,---,8, 
{(¢ + 2°" 5), (§ + 2° 5), (8 + OY, 5), 
(+ 2°*** 5) (¢,7+1)}, 6=0,1,2;€ =0,1, 
{(i, 7), (¢ + 2,9), (¢ + 2°", 5), (6 + 2", 5), (6 + 27", 5}, 
6 = 0, 1, 2, 
{(i, 7), (@ + 27,9), (6+ 2°,7), (6+ 2",9), (6,9 + 1)}. 
(7.12.20) 39 ¢ B(5, 10). 
Elements: (7,7), (¢ = 0,1, ---,12; j = 0,1, 2). 
Blocks: {(i + 2°, 7), (¢ + 277° 7), (¢ + 2°*% 7541), 6+ 2°°,5 +1), 
(t,j7 + 2)}, 8 =0,1,2, twice, 
\(¢ + 2°. 3), (7+ > a, (7+ , J), (+ , a. 
(4,7 +1)}, 8 =0,1,2, 
{(¢ + 27,7), (6 + 27™* 5), (6 + 27, 5), (46,541), (4,5 + 2)}, 
y = 0,1, taken 5 times. 
(7.13) If v= 1(mod 4), then v ¢ B(5, 5), provided that v ¢ B;(5, 5) forv = 
4u+l1,ue K(5, {0, 1, 2, 3, 4}). 





386 HAIM HANANI 


For v = 5 the proposition is trivial, for v = 9, 13 and 17 see (7.12.3), (7.12.5) 
and (7.12.7) respectively and for v = 21 apply (7.4) withk = 5, R = {0, 1, 2, 3, 4}, 
A = 5. 


REFERENCES 
[1] R. C. Boss, “On the construction of balanced incomplete block designs,’’ Ann. Eugen- 
ics, Vol. 9 (1939), pp. 353-399. 
2] R. H. Bruck anv H. J. Ryser, ‘‘The nonexistence of certain finite projective planes,”’ 
Canadian J. Math., Vol. 1 (1949), pp. 88-93. 
3] R. D. Carmicnas., Groups of Finite Order, Dover Pub., New York, 1956. 
[4] W. S. Connor, Jr., “On the structure of balanced incomplete block designs,’’ Ann. 
Math. Stat., Vol. 23 (1952), pp. 57-71. 
| L. Euuer, ‘‘Recherches sur une nouvelle espéce de quarrés magiques,’’ Verhandlingen 
uitgegeven door het zeeuwsch Genootschap der Wetenschappen te Vlissingen, Vol. 9, 
(1782), pp. 85-239; also Commentationes Arithmeticae, Vol. 2, Petrograd, 1849, pp. 
302-361; and Leonardi Euleri Opera Omnia, Teubner, Leipzig and Berlin, Ser. I, 
Vol. 7 (1923), pp. 291-392. 
MaRsHALL Hatt, Jr., “Cyclic projective planes,’’ Duke Math. J., Vol. 14 (1947), pp. 
1079-1090. 
7) MarsHa.t Hatt, Jr. anp W. S. Connor, ‘“‘An embedding theorem for balanced incom- 
plete block designs,’’ Canadian J. Math., Vol. 6 (1954), pp. 35-41. 
M. Hatt, Jr., “A Survey of Combinatorial Analysis,” Some Aspects of Analysis and 
Probability, John Wiley & Sons, New York, 1958, pp. 76-104. 
Harm Hanan, ‘On quadruple systems,’’ Canadian J. Math., Vol. 12 (1960), pp. 145- 
157. 
Harris F. MacNetsu, “Euler squares,’’ Ann. Math., Vol. 23 (1922), pp. 221-227. 
H. B. Mann, “On the construction of sets of orthogonal Latin squares,’’ Ann. Math. 
Stat., Vol. 14 (1943), pp. 401-414. 
[12] E. Hastines Moore, “Concerning triple systems,’’ Mathematische Annalen, Vol. 43 
(1893), pp. 271-285. 
[13] Exrtakim Hastineas Moore, ‘Tactical memoranda,’ Amer. J. Math., Vol. 18 (1896), 
pp. 264-303. 
[14] H. K. Nanprt, “On the relation between certain types of tactical configurations,’’ Bull.. 
Calcutta Math. Soc., Vol. 37 (1945), pp. 92-94. 
[15] (M.) Reiss, ‘‘Ueber ein Steinersche combinatorische Aufgabe,’’ J. fiir die reine und 
angewandte Mathematik, Vol. 56 (1859), pp. 326-344. 
[16] James Sincer, “A theorem in finite projective geometry and some applications to 
number theory,’’ Trans. Amer. Math. Soc., Vol. 43 (1938), pp. 377-385. 
[17] J. Sterner, ‘““Combinatorische Aufgabe,”’ J. fiir die reine und angewandte Mathematik, 
Vol. 45 (1853), pp. 181-182. 
[18] Tu. Skotem, ‘‘Some remarks on the triple systems of Steiner,’’ Mathematica Scan- 
dinavica, Vol. 6 (1958), pp. 273-280. 
[19] W. L. Stevens, “‘The completely orthogonalized Latin square,’’ Ann. Eugenics, Vol. 
9 (1939), pp. 82-93. 
[20] G. Tarry, ‘‘Le probléme des 36 officiers,’’ Compte rendu de la session, Assoc. frangaise 
pour l’avancement des sciences, Vol. 1 (1900), pp. 122-123, vol. 2 (1901), pp. 170- 
203. 
(21; OswaLp VEBLEN AND W. H. Bussey, ‘‘Finite projective geometries,’’ Trans. Amer. 
Math. Soc., Vol. 7 (1906), pp. 241-259. 
[22] G. Werruer, ‘‘Tabelle der kleinsten primitiven Wurzeln aller ungeraden Primzahlen 
unter 3000,’’ Acta Mathematica, Vol. 17 (1893), pp. 315-319. 
[23] Ernst Witt, “Uber Steinersche Systeme,” Abhandlungen aus dem Mathematischen 
Seminar der Hansischen Universitat, Hamburg, Vol. 12 (1938), pp. 265-275. 





RANDOM ALLOCATION DESIGNS Il: APPROXIMATE THEORY FOR 
SIMPLE RANDOM ALLOCATION’ 


By A. P. DempsTrer 


Harvard University 

1. Introduction. 

1.1. Aims. In Section 2 of a previous paper [1] a viewpoint was described 
under which one can compare within one framework a wide class of randomized 
experimental designs, namely those designs called in [1] random allocation de- 
signs. To make such comparisons one needs to know how each design performs, 
which, from our point of view, means finding the variances of linear unbiased 
estimators under the randomization hypothesis. The aim of this paper is to take 
a beginning step towards finding some variances analytically. 

The previous paper [1] defined some very general classes of techniques of 
linear unbiased estimation. The practical use of these general methods is in- 
hibited by two difficulties. Firstly, the computations with data are laborious 
and unfamiliar, and, secondly, the calculation of the variances of the estimators 
presents formidable mathematical difficulties. In this paper we avoid the first 
difficulty by considering a smaller class of estimators, within the general class, 
consisting only of estimators which lead to familiar data computations. The 
mathematical difficulties remain however, so that the calculation of variances is 
attempted only by indirect approximate methods which apply only to simple 
random allocation designs as defined in [1]. The restriction to simple random 
allocation, although very stringent, does allow answers to some interesting 
questions, for this case is, in a sense, the most radical form of random balance 
design. 

1.2. The class of data analysis techniques. We postulate data consisting of n 


quantities corresponding to a subset of n of the N cells of a complete crossed 
k-factor array, i.e., 


N=R-C---L, 


where R, C, --- , L are the numbers of levels of the first, second, --- , kth fac- 
tors. In broad terms, the techniques under consideration have two stages, the 
first stage consisting of the least squares estimation of a selected set of effects, 
and the second stage consisting of further estimation and testing based on the 
residuals from the first stage. 

When a mathematical statistician is confronted with an unbalanced fraction 
of a complete array, his first thought is to set up a linear model with various 


Received December 2, 1959; revised September 26, 1960. 

1 This research has been supported in part by the United States Navy through the Office 
of Naval Research, under contract Nonr 1866 (37). Reproduction in whole or in part is 
permitted for any purpose of the United States Government. 


387 





388 A. P. DEMPSTER 


selected main effect and interaction terms plus random errors, and then to 
estimate these effects by least squares. It is assumed that the reader knows how 
to do this (e.g., by setting up the so-called normal equations and solving them 
as in Wilks [4] p. 192). This is precisely the first stage of our analysis. We do 
not, however, commit ourself to the model which led to this first stage estimation, 
but proceed to look for further effects. In general, we denote by m the number of 
effects selected for fitting at the first stage, and we may choose any m such that 
0 < m S n. In practice one always chooses m = 1, for one always fits at least 
the grand mean, and one generally allows n — m to be moderately large in order 
to have some interesting variation left in the residuals. 

The second stage of our analysis uses as input the n residuals after the first 
stage fitting. The simplest way to think of the second stage computations is to 
imagine a complete array of N cells whose n entries corresponding to the ob- 
served cells are the residuals from the first stage, and whose N — n entries corre- 
sponding to the remaining cells are all zero. The second stage computations are 
simply the usual analysis of variance computations on this complete array of 
size N, i.e., the usual linear estimators and the usual mean squares. Of course, 
the linear estimators and mean squares corresponding to effects estimated at the 
first stage will be zero, and it is only the remaining effects, and not necessarily 
all of these, which are of interest at the second stage. 

It is clear that the linear estimators calculated at the second stage are not un- 
biased, for the N — n zeros in the array have the effect of reducing the absolute 
value of the average of such an estimator. Thus a correction factor is needed to 
make such a raw estimator into an unbiased estimator. The approximate theory 
of Section 2 suggests, at least for the case of simple random allocation, that an 
approximate correction factor for unbiasedness is 
(1.2.1) ee, 

n—m 

It is also suggested, again for the case of simple random allocation, that ratios of 
mean squares coming from the second stage analysis of variance may be tested 
as F statistics with the same degrees of freedom as would be used if the full 
array of size N were observed. In effect this suggestion is saying that the F tests 
are sufficiently robust to be approximately valid when applied to non-normal 
data where the observations are zero with probability 1 — (n/N) and other 
quantities with probability n/N. This suggestion arises more precisely from the 
approximate theory of Section 2. 

The foregoing is intended to be a verbal description of how to carry out data 
analysis for a general technique in the class of techniques under consideration. 
To be more concrete we take the example of a 3-factor design where N = R-C-L. 
The choice of a particular technique in the available class of techniques is made 
by deciding which effects to estimate by least squares. In our example let us 
decide to estimate by least squares the grand mean, the row main effects and the 
column main effects. Thus we observe a subset of n of the N quantities v, for 





RANDOM ALLOCATION DESIGNS, II 389 


,lSj 8 Cand1 sk S L, and, as the first stage of analysis, we find 
0.5 W hich minimize rt quantity 


>. (vin — 0.. — % — @.;)*, 
where summation is over the n observed cells. Because of linear constraints 
(e.g., we may require >.1 0;. = > y 6.; = 0) we are here fittngm = R+C—1 


parameters. We shall assume for simplicity that all of the parameters are uniquely 
estimable. The first step in the second stage of analysis is to consider the array 


Y ix = in — 0.. — O;. — 0.5 if cell (4, <. k) is 
observed 


= otherwise. 


From this array one computes in the usual way linear estimators 0... , 0;;. , O;-% , 
). , and 0; , and also the corresponding meansquares(MS),; ,(MS)kec,(MS)ez, 
(MS) cz and (MS)ercz. Here, for example, 


R Cc R Cc L 


a 
RY - REL LLYe 


t=] jel kel 
R Cc 
1 


~ RC Z ie Y inn ? 


t=] j=l 


1 = 2 
MS), = —— ae 6 
(MS), = 7 X (0-4) 


Supposing now that the n observations come from a simple random allocation 
scheme we multiply these second stage linear estimators by c from (1.2.1) to 
make them approximately unbiased, and we test the mean squares, for example 
by regarding (MS),/(MS)xecx as an F-statistic on L — 1 and (R — 1)(C — 1) 
(L — 1) degrees of freedom. 

1.3. The theoretical approach. The statistical model under consideration is as 
follows. Using 3-factor notation for simplicity, we suppose that, corresponding 
to each of the N cells (7,7,k) forl Sis R,1 Sj Cand1 s k& L, there 
is a quantity v;;, which is observed if an experiment is performed at levels (7, 7, k). 
We also suppose that 


(1.3.1) Vitk = Vijk 1+ Cije 


where the »;;, are non-random quantities and the e;;, are uncorrelated random 
variables with common mean zero and variance o’. We shall further assume that 
the e;;, are normally distributed only when discussing F-ratios. In the case of 
simple random allocation we suppose that a simple random sample without 
replacement of n of the N quantities v;;, is observed. Using these data and an 
arbitrary member of the class of techniques described above, we compute the 





390 A. P. DEMPSTER 


first and second stage linear estimators along with the second stage ratios of 
mean squares. Our basic objectives are to find the first and second moments of 
these linear estimators and the distributions of the ratios of mean squares. It 
should be emphasized that the randomness in these statistics arises from two 
sources, first, the discrete randomness induced by the randomization hypotheses, 
i.e. the random choice of the subset of n of the N cells, and second, the random- 
ness induced by the e;; . These sources are assumed throughout to be statistically 
independent. 

Unfortunately any attempt through mathematical analysis to meet the ob- 
jectives posed runs into great difficulties. It just does not appear feasible to 
compute directly the distributions of the various statistics under the randomiza- 
tion hypothesis. For this reason the main mathematical results of this paper, 
presented in Section 2 and derived in Section 4, refer to a quite different statistical 
model underlying the randomness of the computed statistics. As statistical 
theory relating to a well-defined model these results stand on their own. On the 
other hand these results are intended as approximations to the corresponding 
results for the simple random allocation model. A completely satisfying dis- 
cussion of the accuracy of these approximations is beyond the scope of this 
paper, and indeed seems feasible only by Monte Carlo methods. Thus any inter- 
pretations of the results, such as those given in Section 3, must be regarded as 
suggestions whose verification will require Monte Carlo methods. 

The second or approximating model replaces the discrete randomization 
probability mechanism by an analogous continuous probability mechanism, and 
henceforth this model will be referred to as the continuous analogue model. The 
detailed definition of the continuous analogue model requires considerable care 
and is postponed to Section 4. Likewise the discussion of the motivation and 
partial justification of this model is deferred to Section 5. 


2. Results. 

2.1. Formulas for means and variances. This discussion will be carried out 
using the terminology and notation of Section 3 of [1] except that we shall now 
use general notation not restricted to a 3-factor case. Thus we suppose that the 
N factor level combinations are labeled 1 to N, and we define a Euclidean vector 
space £ in terms of unit orthogonal basis vectors V, , V2, --- , Vw which are in 
correspondence with the N cells of the basic array. The cells have associated 
quantities 


(2.1.1) vi, = y+ e, or. 1418 2 


where v; is observed if the experiment corresponding to cell 7 is performed. 
Formula (2.1.1) is the same as (1.3.1) using different subscripts, so that the »; 
are fixed quantities and the e; are uncorrelated random variables with means 
zero and variances all o’. 

Our aim is to provide unbiased linear estimators for linear combinations of the 
v;, ie., for quantities like >>! c,v;. These quantities may be regarded as the 





RANDOM ALLOCATION DESIGNS, II 


values of a linear functional g, over E defined by 
N 
(2.1.2) g(V) = » Civ; 


where V = )-1 c,V; is any vector in E. Now the first stage of estimation, the 
least squares stage, estimates g,(V) where V belongs to a selected m-dimensional 
subspace of EF which we shall denote by £,, . The second stage of estimation esti- 
mates g:(V) where V belongs to the (N — m)-dimensional subspace of E orthog- 
onal to E,, , and this subspace we shall denote by £,, . It will be convenient to 


define W, , W:, --- , Ww to be an alternative unit orthogonal basis of E such 
that 


Wic En for 1 ism, 


W:<¢ E,, for m+1sSiSsN, 


and to denote g:(W;) by w; . Then the least squares stage of estimation provides 
an estimator 4, for any w. = > T aw; = g:(W.), where W. = >-T a.W; is any 
vector in E,,. Similarly the second stage provides a raw (i.e., uncorrected for 
bias) estimator 6, for any w, = aah bw; = g:(W,), where W, = ee b.W; 
is any vector in £,, . 

The following formulas are derived in Section 4.3 and are exact formulas 
relating to the continuous analogue model. Denote by A’ and B’ the quantities 

Ta; and >.%,, 67, and set 


1 x 2 
a ys ions dunks 
(2.1.3) (MZ) rr N—m Shs we 
Then 
(2.1 A) ave {Ba } 


(2.1.5) ave {(a.)"} 


(2.1.6) ave {a} = 


and 


ib Hie a |(#== +? 1 Jt 


N—-m+2 N—m 


+ (1 i eam ts) B(M2) 11 + (1 laine 


N-—-m-—1 
(2.1.7) 


m+ 2 l 





392 A. P. DEMPSTER 


Thus 6, is unbiased for w, with variance 
N - N - ~ 
(2.1.8) wer (a) « 2 pa, 4 Ba a 
n—-m-—1 n—-m-—1 


’ 
and (N — m)&/(n — m) is unbiased for w, with variance 
ais N-m., N-—m™ 
var | ———_ & } = —————_. 
n—m N—-m+2 


v= =| N—-m-—2 
( 


“n—-mLN — m)(N — m — 1) 


a" BME) | 


N—m 


(2.1.9) : 4. 


+ N — ™ pp 2 


n—-m™ 


The factor (N — m)/(n — m) applied to & explains the factor c in formula 
(1.2.1). It is worth noting, without actually displaying the formulas, that the 
general theory of Section 4.3 also provides formulas for all covariances among 
such estimators. 

For concreteness let us apply these formulas to the specific case discussed in 
Section 1.2 of a 3-factor design where grand mean, row main effects and column 
main effects are estimated at the first stage. In this case m = R + C — 1 and 
E,, is the subspace spanned by the three subspaces Ey , Ez and E¢ defined in 
Section 3 of [1]. If we revert to the notation of (1.3.1) we may define (MZ), , 
(MZ)e,--:, (MZ)zcx to be the mean squares arising from the analysis of 
variance of the complete array »;; . Expressed in terms of these quantities, 
(MZ), of (2.1.3) is given by 


(N — m)(M2Z)1 = (L — 1)(M2Z)i + (R — 1)(C — 1)(M2)rc 
+ (R —1)(L —1)(M3)pe. 
+(C—1)(L—1)(M2)c1 
+ (R—1)(C —1)\(L—1)(M2)ecr. 


Now a typical example of an w, to be estimated at the first stage is the differ- 
ence of two row main effects, i.e. 


1 Cc L 


Wa = CL pa » (vis jn a. Vinik), 


j=l k=l 


for which A* = 2/CL, and hence the variance of the least squares estimator is 
given by (2.1.8) with (M2);r given by (2.1.10) and A® = 2/CL. Similarly a 
typical example of an w to be estimated at the second stage is the difference of 
two layer main effects, i.e., 


Cc 


1 & 
Wp 2 2. (vise, a Viike)s 


% RC t=] j=l 





RANDOM ALLOCATION DESIGNS, II 393 


for which B’ = 2/RC and formula (2.1.9) applies directly to give the variance 
of the unbiased estimator of w» . 

It should be clear from this example how to write down variances for any 
estimator from any particular technique in the class considered, and for an array 
with any number of factors. 

2.2. Significance tests. Consider the W; and w; introduced above, in particular 
those entering at the second stage of analysis where m + 1 S i S N. Suppose 
that N — m = M, + M. + Ms, that we are willing to assume that w; = 0 for 
m+ M,+ M:+ 1s 1S N, and that we wish to test the null hypothesis that 
wo = Oform+i1s is m+ M, with wo; arbitrary form+M,+1sSi¢s 
m+ M, + M,. Then, denoting by 4; the raw second stage estimator of w, , the 
natural test statistic for this null hypothesis is (MS),/(MS); , where 


1 m+M; 


(2.2.1) (MS): = 5 LD (a)? 


i=—m+1 
and 
N 
(2.22) (i, 2 > SS ots 


M; i=—m+M 1+Mo+1 


The main result of Section 4.4 states that, under the continuous analogue 
model and assuming normality of the e; of (2.1.1), (7S):/(MS); has exactly the 
F distribution on M, and M;, degrees of freedom regardless of the values of w,; 
form+M,+1sism+M,+ M2. 

Thus, for the standard 3-factor example, if we denote by (MS),, (MS)zc, 


-++ , (MS)ecx the mean squares arising from the second stage analysis of vari- 
ance, and if we are willing to assume (MZ)rc, = 0, then we may test the null 
hypothesis that (MZ), = 0 by regarding (MS),/(MS)xczx as an F statistic on 
L — 1 and (R — 1)(C — 1)(ZL — 1) degrees of freedom. Similarly the second 
order interactions may be tested against the third order interaction. 

Note that it is not true that the numerator and denominator of such a test 
statistic are distributed as multiples of independent x’ random variables, but 
only that the ratio has the stated F distribution. By more detailed arguments of 
the type given in Section 4.3 we could specify the distributions of the numerator 
and denominator, and we could specify the non-central distribution of the test 
statistic. Since the resulting distributions are not of familiar simple types this 
analysis has not been pursued. We can, however, say something simple which is 
related to the non-central distribution of the test statistic, for formula (2.1.7) 
allows us to write down formulas for the average values of the numerator and 
denominator mean squares. For example 


n—m n—m+2 I 
ave {(MS):} = 1 on aieenn Mz 
ive at VM S 15 N ak an ion 1 |(* —m + 9 N _— -) t M 


4+ (: aid amt?) (M2) ir + (1 ae )|. 





N-—-m-+2 N-—m™ Zz 





394 A. P. DEMPSTER 


Obvious similar formulas hold for (MS). and (MS);, or for (MS)_, --- 
(MS) ecx in the 3-factor example. 


3. Interpretations. To show how the formulas of Section 2.1 may bear on a 
practical problem we now attempt a comparison between a simple random alloca- 
tion scheme and a more orthodox fractional factorial. Suppose n = 2” observa- 
tions are to be allowed on a factorial structure with 2” — 1 factors at 2 levels 
each, so that 


N = 9?"-1 


It is well known (c.f., [2]) to be possible to determine a fixed fraction of size 2” 
with the property that all 2° — 1 main effects are unconfounded and estimable 
using 2” — 1 simple orthogonal linear combinations of the data points. If such a 
fraction is chosen, and if, in advance of using it, the labels of the levels of each 
factor (i.e., 1 or 2) are assigned at random, then this design is a random alloca- 
tion design within the definition given in [1]. On the other hand one could design 
a simple random allocation experiment by choosing a simple random sample 
without replacement of 2” out of the 2” factor level combinations. For values 
of r in the range r = 4, 5, 6 the question of the relative performance of these 
alternative designs is of some practical interest. From our point of view the way 
to compare these designs is to compare the variances of linear unbiased esti- 
mators where variances are found by averaging both over the randomness of the 
randomization hypothesis and over the randomness of the “error”? superposed on 
the observations. (The pros and cons of this point of view were discussed in 
Section 2 of [1].) 

In the case of the fixed fraction there is only one standard method of estimating 
main effects, and the computation of variances for these estimators is easy. For 
the simple random allocation fraction there are many different approaches to 
estimation, for example all of those described in Section 1.2, and the computa- 
tion of variances is difficult. Thus we are obliged to use the approximate formulas 
of Section 2.1 and to treat the results as tentative and subject to checking by 
Monte Carlo methods. The plan is to make a first comparison in terms of (a) an 
initial model underlying the data and (b) an initial method of estimation for the 
unbalanced data. Further comparisons will be made modifying both (a) and (b). 
The initial model is a main effect model, i.e., the model of (2.1.1) with the restric- 
tion that »; is made up only of main effect terms so that 


27—1 


(3.1) v= pt 2, (+4A,), 


where A; is the main effect of factor ¢ and the sign of A; is + or — according as 
v; is an observation at the upper or lower level of factor t. The initial method of 
estimation is the simplest practical method in the class described in Section 1.2, 
namely the method where only the grand mean uy is estimated at the first stage 
and all other effects are estimated at the second stage. 





RANDOM ALLOCATION DESIGNS, II 395 


Under the initial model the estimator of A; from the balanced fraction data is 


4 (mean of the 2””’ observations at the upper 
level of factor t — mean of the 2” 
observations at the lower level of factor ¢). 


This clearly has variance 


(3.2) ov: 

Notice that this estimator is unbiased with the given variance conditional on the 
particular design chosen under randomization, and so has the same properties 
when averaging is carried out over the random choice of design. The comparable 
formula for the simple random allocation design is given by (2.1.9) where 


m=1, N=2", n=2, 
B’ - 1.N-(2/N)’ ion er 


’ 


1 27-1 
B’(M>) 1 = Ww » Ai. 
If this formula is simplified by ignoring distinctions between n — m and n, 


N — nandN,N — m + 2and N, etc., then the variance from (2.1.9) is seen to 
be approximately 


io | 
(3.3) s (a! +> a+ “). 


u=] 


Now it is clear that the latter variance (3.3) is worse than the former (3.2), 
and even that it could be enough worse to destroy the value of the estimator. 
This is not the whole story, however, for the proponent of simple random alloca- 
tion may argue that he can eliminate from the variance (3.3) as many of the 
offending A*, terms as he wishes, simply by altering his method of analysis to fit 
the corresponding A, main effects by least squares in the first stage of analysis. 
This is true, and corresponds to what would be done in practice, but two new 
disadvantages of simple random allocation appear at this point. Firstly, it will 
not be known which A, are offending except from prior beliefs or from trial 
analyses of the data. Secondly, there is a price to be paid in additional variance 
for the least squares removal of the offending A, terms. Note that both (2.1.8) 
and (2.1.9) contain factors of (n — m)~ or (n — m — 1)~ which were treated 
as n in (3.3). However, if m comes to be an appreciable fraction h of n, then 
the variance in (3.3) should be altered not only by omitting the fitted Ai terms 
but also by multiplying by factor (1 — h)~*. The first disadvantage is less a 
criticism of the technique than it is a statement that trial analyses affect the 
properties of the technique in an unknown way. The second disadvantage is more 
illuminating, and could be important if a substantial proportion of the main 





396 A. P. DEMPSTER 


effects were large. The author believes that, provided we can accept the main 
effect model, the discussion of this paragraph gives a clear picture of how the 
simple random allocation fraction yields efficiency to the balanced fraction. 

However, the proponent of simple random allocation may claim that inter- 
action effects should not be ignored, and he may insist that we compare variances 
after putting in the terms corresponding to interactions. In the case of the 
balanced fraction each main effect A; is confounded with 


interaction effects which may be denoted by A,,,forl S tS 2°,—landlSsS p’ 
and we may generalize the model in (3.1) to 


27-1 Dp 
(3.4) vy=pt D (a+ 2 + due). 


t=1 


s=l 


When the formula generalizing (3.2) is sought, it . found that, on account of 
the confounding, the estimators for the balanced fraction design are not even 
unbiased until averaging is carried out over the random choice of design, and 
also that the confounded effects enter into the variance in the obvious way 
resulting in variance 


Pp 
(3.5) > Ae t 2 eo. 
s=1 


The formula generalizing (3.3) comes as before from (2.1.9), the only difference 
being the inclusion of all of the A’ terms in (M2); . Thus (3.3) becomes 

27—1 p 
(3.6) hd [a + a (a: + 2 2%.) + “|. 
Formula (3.6) differs from (3.5) in that it has approximately 2” times as many 
A’ terms as (3.5), these additional terms being compensated for by the factor 
2°’. Thus, if interaction effects are entering substantially into a few of the 
variances (3.5), then these effects will be greatly spread out when (3.6) applies, 
and it is no longer at all clear that the balanced fraction is superior to the simple 
random allocation fraction. 


4. The theory of the continuous analogue. 

4.1. Geometrical considerations. We now describe the general two stage estima- 
tion procedure in geometrical terms, following the notation of Section 3 of [1] 
as introduced in Section 2.1 of this paper. We suppose that a linear functional 
f, over E is defined by 


N N 
(4.1.1) ft (> Ws) = den, 
i=1 i=] 


N : . , 7, : \ 
where >-1 c,V; is any vector in E. The data provide the values of f:(V) for any 
vector V in the n-dimensional subspace E, of E spanned by the V; corresponding 





RANDOM ALLOCATION DESIGNS, II 397 


to the n observed cells. In fact, we may regard the information in the data as 
providing the functional f;,, , where 
fio(V) = fi V) for Vek, 


= 0 for V1 Ep. 


(4.1.2) 


The first stage of estimation provides the least squares estimators 4, of w, = 
g:(W.) for any W, in £,,, and the second stage of estimation provides raw 
estimators 4 of w, = g:(We) for any W, in £,,. Our immediate purpose is to 
characterize vectors Z, and Z, , both in E, , with the properties that 


(4.1.3) Qa = fi( Za) and Op = Si( Ze). 


As a preliminary we define a one-to-one correspondence between linear func- 
tionals f and vectors F. Given any linear functional f over E defined by 


f(Ui) = u; 


for 1 S i S N, where the U; are a unit orthogonal basis of ZH, we may define an 
associated vector F as the vector with components wu; relative to the basis U, . 
Conversely, from F one may recover f. It may be easily checked that the corre- 
spondence thus defined does not depend on the particular choice of basis U; . 
It is also clear that, if f; and fe are two functionals with corresponding vectors 
F, and F,, then asf; + aefz has corresponding vector a,F, + a:F,. Thus the 
set of all linear functionals over EF and the set of all vectors in E form isomorphic 
vector spaces in an obvious manner and we may use interchangeable languages. 
For example, we may regard the information in the data as f;,, or its corre- 
sponding F,,,. 

Now the well-known geometrical interpretation of least squares (c.f., Scheffé 
[3], p. 12) uses vector language and states that the process of least squares 
fitting is equivalent to splitting F;,, into 


(4.1.4) F,.p = Fy, + Fr 


where F,; and F,; are both in E, but F,; is in E,M £,, and F, is perpendicular 
to E,”N E,, . F, represents the fitted variation and F,, represents the residual 
rariation. However the complete solution of the least squares problem requires 
the determination of a functional f; such that 


fr( Wa) = & for Wc En 
fr(V) 0 for Ve£,,. 


(4.1.5) 

The crucial property which this functional must possess is that it must reproduce 
the fitted variation on E, , i.e., we must define f; , satisfying (4.1.5) and 

(4.1.6) fr(V) =f V) for VeE,, 


where f; is the functional corresponding to the vector F, . 





398 A. P. DEMPSTER 


33 — ; : a 
Given any V in E, we may remove its component along E,M E£,, and have 
, “ ss in ana 
left a vector Z, in E, but orthogonal to E,/NM E,, . Then it is clear that 


(4.1.7) fe( Za) = f.pl Za) = f,(V). 


From Z, we may further remove its remaining component along £,, and have 
left a vector W, in E,, . Conversely, given such a W, we can determine uniquely 
its corresponding Z, as follows: Z, is that vector in E,,. which (7) differs from 
W. by a vector in E,, and which (ii) is minimum distance from W, subject to 
(7). Here E,,. is the subspace of #, formed by the intersection of EZ, with the 
space spanned by W, and £,, . This situation is pictured in Figure 1. If we assume 
that every w, is estimable, i.e., that no vector in E,, is orthogonal to EZ, , then 
every W, in E,, can be reached in this way by some Z, in FE, and we have a one- 
to-one linear correspondence between every vector W, in £,, and its correspond- 
ing Z, . If we now define 


Sr (Wa) = fel Za) for Wie En 


(4.1.8) ‘ 
fr(V) =0 for Vek, 


then it follows from (4.1.7) and (4.1.8) that 
fr(V) = fr(Wa) = flZa) = fi(V) 
for any V in E, , as required by (4.1.5), and we conclude that 
Oa = fi( Za) 


is the least squares estimator of w. = g:( Wa). 

The situation with the second stage estimators is simpler, for the raw estimator 
of w, = g:( W,) is simply f;,(W,) where F,,; is defined in (4.1.4). Since f;;,(V) = 0 
for any V in E,, it is clear, as stated in Section 1.2, that the second stage anal- 
ysis of variance simply gives zero for those effects w, estimated at the first stage. 
Also, if Z, is defined to be the component of W; in E,/M E,, , then 


fu We) = ful Zo) = fep(Zo) = fi( Zo), 


so that 
Op = fil Zo), 


where & is the raw second stage estimator of w, = g:(W,). 

4.2. The model for the continuous analogue. 'The formulas (4.1.3) indicate how 
the estimators 6, and & may be expressed in terms of the underlying functional 
f, and the subspace /, where £, is at the choice of the experimenter. The sub- 


’ 


space EF, used in an actual experiment is necessarily one of the discrete set of 


N , . . rT Ty 7 
( subspaces E, determined by which n of the N cells are used. No other E, 
. ; 


‘an be observed with n observations. However one can postulate a model under 
which f; is observed on other n-dimensional subspaces with positive probability, 
and (4.1.3) provides reasonable definitions of 6, and & for such a model. 





RANDOM ALLOCATION DESIGNS, II 399 


In the simple random allocation scheme the random subspace £, is that 
spanned by a simple random sample of n of the N unit vectors V,; , V2, --- , Vw. 
In the continuous analogue model all directions, not just those of a unit orthogonal 
set, are regarded as “equally likely”’ in the following sense: FZ, is taken to be that 
random n-dimensional subspace of EF which is spanned by n independent spheri- 
cally distributed vectors. A random vector in E is said to be spherically distributed 
if the distribution of its direction is invariant under any orthogonal transforma- 
tion of E leaving the origin fixed. The simplest analytical realization of a spheri- 
cally distributed vector is a vector whose components relative to a unit ortho- 
gonal coordinate system are independently N(0, 1). The random subspace EF, 
defined in this way may also be called a spherically distributed random subspace 
of dimension n. 

The author believes that, for reasonably large N and n, the distributions of 
6. and & for the simple random allocation scheme may be reasonably well 
approximated by the corresponding distributions under the continuous analogue 
model. This issue was discussed briefly in Section 1.3 and is discussed further 
in Section 5. Sections 4.3 and 4.4 derive certain distribution properties of the 
continuous analogue model. 


FIGURE 1. 
Geometrical picture of Za , Zs , Wa , We and various subspaces. 


4.3. Derivation of formulas. Formulas are now derived for the first and second 
moments of 6, and & under the continuous analogue model. For given w. = 
> Taw; = gi Wa.) and w = >-%41 bw; = gi(Ws) we introduce a new basis 
U,, --- , Uy of unit orthogonal vectors in E with the properties that 


W. = AU, Ue En for 1Sism, 
W, = BUnii, and U;e E,, for m+1S is N. 


Here, as before, A? = >-T ai and B’ = >-* 4. bf . Figure 1 gives a picture of the 
various relevant vectors and subspaces of E, except that Z,..,£,M £,, and E,, 
are pictured as having 2, 1 and 2 dimensions rather than n — m+ 1,n — ™m 
and N — m dimensions respectively. 





400 A. P. DEMPSTER 


Suppose Z, makes angle ® with U,. The spherical distribution of #, in E 
induces a spherical distribution of Ey. in the space spanned by U, and £,, so 
that is distributed like the angle between a spherically random (n — m + 1)- 
plane and a fixed direction in (N — m + 1)-space. This is the same as the dis- 
tribution of the angle between a fixed (n — m + 1)-plane and a spherically 
random direction, i.e., 


cos® ~ By(n—m+1) ,4(v—n) (0 ® 


Similarly if @ is the angle from Z, to U,,4; , then 
COSA ~ Bi(n—m) h(N—n) (O< 0 


and this is valid conditional on any given ® so that @ and ® are independent. 

Now we may write Z, = AU, + A tan®Z, and Z, = B cos’0Uni; + B cos 6 
sin 6 Z. where Z; is a unit vector in £,, orthogonal to Z, and Zs is a unit vector 
in £,, orthogonal to Um4: . Since &. + & = f(Z. + Z) = Af(U,) +A tan® 
fi Z:) + B cos’6 f(Umsi1) + Bos 6 sin 6 f,(Z2), the only unspecified random 
elements in the expression for this statistic are f,(Z,) and f;( Zz). The marginal 
distribution of Z, given 6 and & but not Z, is simply spherical in &,,, and the mar- 
ginal distribution of Z, given @ and ® but not Z; is spherical in the (N — m + 1)- 
dimensional subspace of £,, orthogonal to V,,4;. The joint distribution of Z, 
and Z» is much more difficult to specify but for purposes of second joint moments 
this is not necessary. For suppose Z; makes angle £; with U; (¢ = m+ 1, --- ,N) 
and Z. makes angle n; with U; (¢ = m + 2, --- , N). Since cos &; or cos n; are 
symmetrically distributed about 0 their averages are 0. Similarly, given £&; the 
unknown conditional distribution of cos £; or cos is still symmetrical about 
0 and thence the following relations hold: 


ave {cos é;cosé;} = 0 if i #7 
ave {cos &; cos n;} = 0 


ave {cos n; cos n;}} = O if i ¥j. 


The marginal distributions of cos £; and cos n; are given by 


9 
cos g; ud By 4(N—m-—1) 


cos’ nj ~ Bya4~v—m—2- 
If we set f:(U;) = u; for « = 1, --- ,N we have 


N 


fi(Z,) = be u; cos &; 


i=—m+1 


N 


fi(Z2) = - U; COS 7; . 


t=m+2 





RANDOM ALLOCATION DESIGNS, II 


Thus we can write 


a+ & = Aw +A tan® > u; cos & + B cos’ Cums: 


t=—m+1 


N 
+ Boos ésin@ >> u; cos m. 
t=m+2 
Now we are in a position to average over the randomness induced by random 
E, , regarding f; as fixed. Since @ and 6 are independent of the £; and 7; we need 
only replace the trigonometric functions by their averages. Thus 


A A 2 
ave {, + a} = Aw + Buns, ave {cos 6} 


Aw +5 payee > aes 
m 


where w. = g:(W.) and w = g:(W,). Also 


fA a \2 22 ‘ 2 
ave {(@, + &) } = Auli + 2Aum Buns ave {cos 6} 
N 
2 2 2 2 
+ Buz... ave {cos* 6} + A’ ave {tan’d} >> ui ave {cos &;} 
t=—m+1 
N 


2 2 2 
+ B’ ave {cos’ @ sin’ 6} >> wu’ ave {cos’ nj} 
t=—m+2 


22 a t—- m 
= Au; + 2 — Aw Bums 


(n — m)(n — m + 2) N-—-n ~ 2 
ay ee OB woe i <r 
(N — m)(N — m+ 2 , Bi uta commarant. aa N-—m™ 


= 


(n — m)(N J —n) > 2 1 


u 


(N — m m)(N re 2) i<mtt Ss N—m—1 
n—m ee 2 


3. ta 62 oe a cnaeniaae 
(4.3.1) a we + Yar. ae 


a (MS); 


= 


(n — m)(N — n) 2, ee 1 2 
— ———————— MS —_ —_——— ) 
1 —m— 1)(N — m + 2) ale M1 N—m vi 


n—m™ n-—m n— m +2 1 
2 ————- Wa, WD + 5 (7 7". w 


N-—m —m-—l1 —m+2 —m 


+ B 





je ; (MS); 
n—-m— 


9 n— m n—m tt : 
—_ l — N — ( MS) ’ 
N nate N —m+2 5) _ 





402 A. P. DEMPSTER 


where 


1 kg 
( MS) _ ; W; e 
= N —- & oe, 


Finally we average over the randomness of f,, i.e., 
ave {w,} = w., ave {wo} = ws, 
= 2 22 ‘ 
ave {wz} = w+Ac, ave {waws} = wawrs, 
2 2 a ’ > 2 
ave {ws} = ow, + Bo and ave {(MS),} = (MZ)n+¢ 
where 
1 N 
(4.3.2) (ie se ae ee 
N — M™ i=m+1 


Thus 


_ ae 


(4.3.3) ave {4, + &} = w + — 
N-—m 


and 


+ _ <@ 2 Pe 
ave {(@, + &)°} = wa + 2 Wa W 


N-—-m 
n—m ¢ —m-+2 


ot 


N—-m~— 1 


4 A N - n 


N—-m-+: 

—_—_—— (] 
n—m-— 1 
n—™m™ z 
care ] x 


N-m-1\ N-m+2 


2 n—m l 22 
+ B ns (: oe =) Bo. 

Formulas (2.1.4) and (2.1.6) are simply special cases of (4.3.3) and formulas 
(2.1.5) and (2.1.7) are simply special cases of (4.3.4). Note also that, if wa, 
and w, are alternative parameters estimated by 4, and 4» at the first and 
second stages, then from formulas (4.3.3) and (4.3.4) we can find var (4, + 4s), 
var (dq + a») and var ([@. + | + [@) + &»]) and hence deduce cov (4, + & , 
Oa + &-). 

4.4. The distribution of the ratio of mean squares. The purpose of this section is 
to prove the following theorem. Suppose the random functional f, is defined by 
(4.1.1) and (2.1.1) where the e; are normally distributed. Suppose E, is spherically 
random according to the continuous analogue model. Suppose 6; is the raw second 
stage estimator of w; form + 15 17S N and (MS), and (MS); are as defined in 
(2.2.1) and (2.2.2). Suppose, according to the null hypothesis, that w; = 0 for 


+ B 





RANDOM ALLOCATION DESIGNS, II 403 


m+1sism+ M,andm+ M,+ M.+1 Si SN, but that otherwise the 
w; are arbitrary. Then (MS);/(MS); has the F distribution on M, and M; degrees 
of freedom. 

Denote by E; the M,-dimensional subspace of £,, spanned by W; for m + 1 < 
i S m + M,, by E; the M,-dimensional subspace of £,, spanned by W; for 
m+M,+ M,+ 18S 1S N, and by £,, the space spanned by EZ, and E; to- 
gether. Denote by F;,; the vector in £,,; whose components are 4; along W; for 
m+1sism+M,andm+M,+M.+18iS8 N, ie., Fi; is the com- 
ponent of F,,; of (4.1.4) in £;,;. Under the hypothesis of the theorem, f:,(W;) = 
d;, where the d; are independently N(0, o*) form + 1S i S m + M, and 
m+M,+ M.+ 1851S N,s0 that the distribution of f,; is invariant under 
any orthogonal transformation of £,,;. Similarly the distribution of EZ, is in- 
variant under any orthogonal transformation of E,; , and EZ, and f; are assumed 
independent. Since f, and FE, determine F;,,; it follows that F,,; is spherically 
distributed in £,,; , and hence that 


(MS), — M; (component of F,,; in E;)* 


(MS); My, (component of F;,; in E;)* 
has an F distribution on M, and M; degrees of freedom. 


5. Discussion of the continuous analogue model. In constructing the con- 
tinuous analogue model an arbitrary choice was made, namely the choice that, 
under the continuous model, E, should be the subspace spanned by n inde- 
pendent sample vectors from some multivariate normal distribution over E. 
This choice was made partly for mathematical convenience and partly because 
of the author’s not infallible intuition in N dimensions. The particular choice of 
the spherical multivariate normal distribution was dictated by the requirement 
that, as with simple random allocation, the distribution of E, should be in- 
variant under all N! permutations of the coordinate axes V; . 

For moderately large N and n this particular continuous analogue has some 
intuitive appeal as an approximation to simple random allocation, for a discrete 


ieee N , : 
distribution with a large number C of equi-probable n-spaces and with sym- 


metry under any permutation of the V;.is approximated by a continuous dis- 
tribution with analogous continuous uniformity and symmetry properties. 
Of course, intuition in N-dimensional space is uncertain. Certainly, statistics can 
be found whose discrete and approximating continuous distributions under the 
randomization hypothesis are quite different, especially in the sense that the 
discrete distribution is very discrete and so not fitted well by any continuous 
distribution. For example, 


fio(Vi) = 1% with probability n/N 


0 with probability 1 — (n/N) 





404 A. P. DEMPSTER 


under the discrete hypothesis, but under the continuous hypothesis is fitted with 
a continuous distribution with some dependence on v2, -*+ , V, as well as on 2. 
However, the real issue for our purposes is whether the approximation is ade- 
quate for first stage estimators 6, , second stage estimators 4, , or second stage 
ratios of mean squares. This issue is, for the most part, beyond the scope of this 
paper. 

One comparison between discrete and continuous case formulas is easy, and 
this we now carry out. We now compute the discrete case formulas analogous to 
(2.1.6) and (2.1.7) appropriate for the special method of data analysis where the 
first stage of analysis is empty, i.e., m = 0. In this case every »; and all linear 
combinations +7 ¢;v; are particular cases of w, parameters, and & = dT cd: 
where 


vy; = v; if the 7th cell is observed 
= 0 otherwise. 
Under simple random sampling 


a n a 12 nm 2 
ave {v,;} = v v; ave {(»;)"} = Vv vi, 


and 


me 8) 
N(N-1) °°’ 


Thus, under the randomization hypothesis, 


Ty, = =< 
ave {v;»;} = 


N 
ave {a} = 5 Ci 
=l 


and 


N 
2 2 n(n — 1 
ave {(@)?} = N , i u+ nt =H de, Cj V,V;. 
4 = 4 N i=l j= 


ij 


_ n(n — 1) ; n(N — n) 
~ NIN a1 (Fem) + NIN — 1) de 


i=l i=l 


Further averaging over the randomness of v; = v; + e; , we get 


a. sale ane 
(5.1) ave {a} = — w 


N 


and 


(5.2) ave { (a)° } = y | (+=) w + (1 - - 7 — )2 De avit Ba ‘| 


where B® = >t c. . If m = 0 is substituted in (2.1.6) and (2.1.7) it is seen that 
(5.1) agrees with (2.1.6), thereby justifying the correction factor c = N/n in 
this case, but that (5.2) differs from (2.1.7) in two ways. Firstly, the coefficients 





RANDOM ALLOCATION DESIGNS, II 405 


depending on N and n differ, but their ratios approach unity as n and N increase. 
Secondly the expression B’(M=),; in (2.1.7) is replaced by > Y civi in (5.2). 
The second difference is due to the symmetrizing effect of the continuous ana- 
logue model. If symmetrized average squares or symmetrized variances are 
considered as in Section 6 of [1], then the difference disappears. In particular, 
when the basic array consists of k factors at two levels each, ie., N = 2", and 
when main effects or interaction terms are estimated, then the c’ are all identical 
and so B’(MZ)1 = > 1 civ; . In this case variance and symmetrized variance 
are the same. 

Further analytical comparisons of the above type become very complex; even 
the trivial case of least squares estimation of the grand mean with m = 1 results 


in very messy algebraic detail. Spot-checking by Monte Carlo seems to be the 
only method available. 


6. Details on the class of data analysis techniques. It was stated in Section 
1.1 that the class of techniques described in Section 2.1 falls within the more 
general class described in [1]. Furthermore we have tacitly assumed throughout 
that estimators like 4, and & are unbiased except for constant scale factors. This 
is true for any type of random allocation design, with the justification following 
from the theory of [1] together with a proof that the methods of Section 1.2 are 
covered by the general methods of [1]. 

The required proof now follows. Suppose, as before, that W, and W, are general 
vectors in E,, and £,, respectively, and consider the statistic 6, + & . We claim 
that this statistic is a special case of what was referred to in [1] as a \-mini- 
mum extension, i.e., 


Ba + & = fC Wa + W2), 


for a particular choice of \-metric. In [1] unbiased estimators were defined from 
fx just as we have defined them from 4, and 4, in this paper. 

Consider the second characterization of f, given in Section 4.2 of [1]. Consider 
also the characterization of 6, and & in (4.1.3) of this paper. These two charac- 
terizations coincide for the limiting choice of \-metric where the A-values corre- 
sponding to W,, ---, W,, all tend to ~, and the A-values corresponding to 
Wri, *** , Wware all equal to unity. For Z, + Z, can be characterized as that 
vector in E, which is at minimum distance from W, + Wz, subject to the condi- 
tion that the components corresponding to X = © are not allowed to count, 
i.e., Z, + Z, is that vector in E, nearest to W. + W, subject to the condition 
that Z, + Z, — W. — W, has zero component along £,, . This completes the 
proof. 


REFERENCES 
[1] A. P. Dempster, “Random allocation designs I: on general classes of estima- 
tion methods,’’ Ann. Math. Stat., Vol. 31 (1960), pp. 885-905. 
(2) R. L. Puackxert anv J. P. Burman, “‘The design of optimum multifactor experiments,’ 
Biometrika, Vol. 33 (1946), pp. 305-325. 
[3] Henry Scuerrh, The Analysis of Variance, John Wiley and Sons, New York, 1959. 
[4] S. S. Witxs, Mathematical Statistics, Princeton University Press, Princeton, 1943. 





SAMPLING MOMENTS OF MEANS FROM FINITE MULTIVARIATE 
POPULATIONS! 


By D. W. BEHNKEN 
American Cyanamid Company, Stamford, Conn. 


Summary. A method is described for deriving the sampling moments of means 
of random vectors obtained by sampling without replacement from a finite 
k-variate population of n vector members. A table of results is presented listing 
the moments of order less than or equal to six as a function of the population 
moments. These moments were originally derived, in a less general form, in the 
course of developing the Simplex-Sum Designs discussed in [1]. Their possible 
wider applicability to sampling problems, however, motivated the extension of 
the work to the general formulas given here. 


Notation and Description of the Method. The n vectors comprising a finite 
k-variate population will be denoted by 


, 
Xu = (Mins Stu's *°* 5 Beels eM 


and the population moments by 


n 
(12%... KY = : >, zidese --> ape. 
N ual 
The order of a moment is given by a = )oa;. 

We are concerned with the sampling moments of the mean vector (Z, , Z:, ---, 
#,) of s vectors x1, , a ,°**, Xu, ,randomly chosen from the populationwithout 
replacement, viz., (41, Z2,-°-+, %) = oD sue Xu, . The sampling moments are 
written as Ave [Zf'Z7* --- #*| where Ave denotes the expectation or average 
over all samples of s from the population. In deriving these results it has been 
convenient to use the bracket notation developed by Tukey in [2] and [3] and 
extended by Robson and Hooke in [4] and [5]. Univariate brackets are defined 
by Tukey as 


(p) = eo ae 


8 i=l 


and in general 


s 


x 


7 2°27" +--+ 25" 


a 


(pi Da +++ Dm) = #4 : 
Pie p s(s —1)---(s—m-+1) 


Received June 15, 1960. 

1 This work was supported in part by the Department of the Army Project No. 5B 
99-01-004, Ord. R and D, Project No. PB2-0001, OOR Project No. 1715, Contract No. DA- 
36-034-ORD 2297, while the author was with the Statistical Techniques Research Group, 
Princeton University. 


406 





FINITE MULTIVARIATE POPULATIONS 407 


where the summation takes place over unequal indices and the denominator con- 
sists of the total number of terms in the numerator. These expressions are “‘in- 
herited on the average” which is to say that their average over all samples of 
size s is equal to the same function of the n population elements. Using a prime 
to denote the average value then we have 
n#Z 
7 af z?? Tore ae” 
Ave eee ‘es = eee bes , = $oJo** "0 - 
[(pi D2 ++ Dm)| = (Pi Pr *** Pm) ae cece 
In extending this notation to a bivariate population (21, , Z2.), vu = 1,2, ---,” 


the bracket (1 2) can take on several meanings. For example we may have the 
symmetric means 





8a : 8 
a ae 


‘s(s — 1) s(s — 1) 


Departing slightly from Hooke and Robson we will adopt the notation (1’, 2’) 
and (1' 2’, 2') respectively for the above two expressions. An obvious exten- 
sion of the principle yields multivariate brackets of any desired order. By using 
this notation we may omit from the bracket expression any vector elements 
whose exponents are zero. That is if we consider a k-variate population 
(Xiu eu *** Leu) We May use 


6 
2 2 2 
a Liu L2y iw 


(1, 2’, 3°) = s(¢ — 1s — 2) 


to represent a symmetric mean involving the first three population elements only, 
while (:’, ;’, ) represents the same expression for a general set of three ele- 
ments chosen from the population vector. It can be noted here that, by definition, 
primed brackets involving commas are not regarded as population moments. 

Multivariate brackets are also inherited on the average. By making use of this 
convenient property the derivation of moment formulas is considerably simpli- 
fied. In finding the sampling moments of means of samples of s drawn from a 
univariate population of n we seek 


Ave|(#:)] = Ave | (= + ot se») ] = Ave[(1')‘], 


8 
or in general for the k-variate case we seek 
Ave [(%,)°"(Z2)%? «++ (&)**] = Ave [(1’)**(2")%? --- (k')**]. 


We may first expand the product (1')*'(2')” --- (k’)** as a linear function of 
multivariate brackets and: then take the average by simply adding primes 
to each bracket. Each of these brackets can then be expanded in terms of single 
index summations, i.e., in terms of the population moments. This is most easily 
accomplished by using tables of symmetric functions [6]. Although these tables 





408 D. W. BEHNKEN 


provide formulas for univariate summations only, they are helpful in writing the 
multivariate generalizations. 


Illustrative Example. The method is illustrated below for the trivial case 
Ave [#;z;). 


Ave [#4;] = s* Ave [(j') + (8 — 1) @’,7')), 
=s [G7 + (s —1)@,7'Y). 


The bracket (:’, j')’ is not a population moment and is therefore expanded 
in the form 


6,37 = —— ; @) «Fy 


The desired moment can then be written as 


" g)) 3? ‘ 3” 
‘ ‘ “1 
Avelz; %] = 3 (Ss — a) « jy +" nay @y<7) y| 


where we use the factorial notation, n°” = n(n — 1)(n — 2) --- (n—p+1). 
While for higher moments the initial expansion of the expectation in brackets 
involves many terms, they may readily be written down in systematic fashion 
by simply forming all possible comma partitions within the bracket, utilizing in 
turn from 1 to k — 1 commas. 

The second moment of the ith element of the vector is easily obtained by 
letting 7 = j and hence obtaining the familiar univariate expression 


. n gi? 3? . 
Avelz;] = 3 | (a - | ~ 3) « @y +" 2® ~ (i y)? | 


n—8 %, Ne) 
Inme aya 2 (i |. 


The coefficient of (i) in the above expression for the moment could be and 
was simplified in this case. We will in general however leave the coefficients in 
the unsimplified form which is convenient for higher order moments where such 
simplification is not always possible or desirable. As can be seen in the table of 
results which follows, moments of order @ will involve factorial coefficient terms 
up to and including s‘*’/n‘”. These formulas only apply strictly however when 
n = asince forn < «only terms up to s/n” can be obtained. It is convenient 
however to list the formulas for n = a and to leave it to the user to delete the 
meaningless coefficient terms should the order of the moment exceed the popula- 
tion size. Thus to retain generality these factorial elements of the coefficients are 
left in identifiable form. Actually, of course, these terms in the coefficients only 
extend to s/n when s < a but as long as either n = a, or terms of higher 
order than s‘”/n™ are deleted when n < a, the unwanted terms will auto- 
matically vanish since s“*” = 0. 


In order to consolidate these results only the most general moment of the 





FINITE MULTIVARIATE POPULATIONS 409 


bulky fifth and sixth order expressions is given since any other moment of the 

same order is easily obtained by equating subscripts. That is, for example, 
=2 —2 -2 . . ° . ° . 

Ave [%;Z;Z,| can be obtained by letting i = 1,7 = m and k = n inthe expression 


The table which follows summarizes results obtained by use of this procedure. 


Sampling Moments of Means of Samples of Size s from a Multvariate Population 
of n Vector Elements 


A. Moment Formulas 
Ave [Z,] = a ‘Z 


Ave [24] = Ani’ + Anti')’'(j’). 
Ave [#;] = An(’”Y + An(@y)?. 


Ave [2:24] = An@j'k’) + An(@Y GRY + GY CRY 
+ (RY GHP) + An YG YY. 


Ave [#2] = An@’j'Y + An(2@ya@yy + GyY@y) 
+ As((i')’)*7'Y. 


Ave [23] = An@®Y + 34n(@) WY + An((y)’. 


Ave [2€,4.4,) = Aa’ PRT) + Ao(@YGkly + Gy@kty 
+ WY GITY + CY GFE) 
+ Aa(GPYRVY + CRY GTY 
+ Gly Gry) 
+ Aul@y G7 yaRty + @y ey gy 
+ @yCyYGkY + Gy@eyaty 
+ yYCyY@RY + &YCYEFY) 
+ Aw@ GRY CY. 
AaGjkY + Aol(2@y@7RY + GY CRY 
+ hy @FY) 
+ Ag(2@j) @kY + @Y GRY) 
+ Au(2@yGy@eky + 2 RY EY 
+ (@YP GRY AGY RY EY) 
+ Aal( GY)? ey. 





Ave [ZZ] = 


Ave [#23] = 


Ave [#;] = 


D. W. BEHNKEN 


Aaj’ + Aa(8@ yay + Wy ey) 
+ 3Ani7@Y + 3Aul@ YY CY 
+ (@y)*@7y) 
+ Aw(G')’)*(7'Y. 
Aut)’ + 2Ao(@yY Gey +7 7) 
+ Agl2(Gj'y')? + @Y GY!) 
+ Auld)’ (7) G7'Y 
+ (@Y PRY + GY EY] 
+ Ag((i')’)*((7')')’. 
Aa@@Y + 44a’) @Y + 34a(()')? 
+ 6Au(@Y)*@Y + Aw(@y)*. 
{5 distinct arrangements\ 
lof GY Gik'l'my f 

10 distinct arrangements\ 
+ Ass a ayy (kum 

10 distinct arrangements 
—— {er GY Gy et'm'y 
15 distinct arrangements 
of (i')’ (ik) (U'm'y’ 
10 distinct arrangements 
of i')'(7') (RY Um'y 
+ Ani) Fy CY aly. 


+ aul 


+ Au 


ie ptkeUm'ny octal {6 distinct or; 


\of (i) (j'k'l'm'n'y’ 

+A 15 distinct arrangements 
63 of apy (kU m'n'y 

f10 distinct arrangements\ 


+ Ae lof Gite y Cm'n'y f 


15 distinct arrangements| 
+ Aw c a y’ ij Ye m'n' y f 


60 distinct arrangements 
+ Age S Gy Gk min’ y \ 





FINITE MULTIVARIATE POPULATIONS 


+A 15 distinct arrangements 
67 of Gp eCY (m'n' , 


+A (20 distinct arrangements\ 

Aw lof Gy Gy @y @m'n'y J 

+A 45 distinct arrangements 
® lof @Y GY ATTY (m'n'y 


(15 distinct ware: 


+ Ag 10 lof CYP RY CY ann , 
+ Aon @ YY RY CY (nly aly. 


B. Coefficients” 


8 ) 


n® 


n® 


(3) (4) (5) 
es 8 8 
—T—- +18, -—6— 
n® n® ? 


(2) (3) (4) (5) 
8 8 of 8 
=—|—, -44, +5 -2-5), 
9 (5 n® n® n® 


2 When the order of the moment exceeds the population size, i.e., a > n, terms in the 
coefficients of the form s/n where p > n are to be deleted from the expression. 





BEHNKEN 


3) (4) ,(5) (6) 


— 299° een % a - 
@ — 300; + 360 — 12075), 


31 °_ + 1805 
7m 





FINITE MULTIVARIATE POPULATIONS 


REFERENCES 

[1] Box, G. E. P. anp Beunxen, D. W., ‘‘Simplex-sum designs: a class of second order 
rotatable designs derivable from those of first order,’’ Ann. Math. Stat., Vol. 31 
(1960), pp. 838-864. 

(2) Tuxey, JoHNn W., ‘‘Some sampling simplified,’ J. Amer. Stat. Assn., Vol. 45 (1950), 
pp. 501-519. 

[3] Tuxey, Joun W., ‘Keeping moment-like sampling computations simple,’’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 37-54. 

[4] Hooxe, Rosert, ‘‘Symmetric functions of a two way array,’’ Ann. Math. Stat., Vol. 27 
(1956), pp. 55-79. 

[5] Rosson, D. S., ‘‘Application of multivariate polykays to the theory of unbiased ratio- 
type estimation,’’ J. Amer. Stat. Assn., Vol. 52 (1957), pp. 511-522. 

{6} Davip, F. N. anp KEenpa., M. G., ‘‘Tables of symmetric functions, Part I,’’ Bio- 
metrika, Vol. 36 (1949), pp. 431-449. 





ON THE FOUNDATIONS OF STATISTICAL INFERENCE: BINARY 
EXPERIMENTS! 


By ALLAN BIRNBAUM 


Institute of Mathematical Sciences, New York University 


0. Introduction and summary. In Part A (Sections 1—5) the canonical forms 
of experiments concerning two simple hypotheses, and their partial ordering, 
are discussed. It is proved that every such experiment is a mixture (in a prob- 
ability sense) of simple experiments whose sample spaces contain only two 
points. In Parts B (Sections 6-8) some general aspects of inference and decision 
problems are discussed in the usual theoretical framework, in which the overall 
mathematical model of an experiment is the frame of reference for all inter- 
pretations of outcomes. 

In Part C (Sections 9-16), attention is directed to that traditional function 
and basic problem of mathematical statistics, called here ‘‘informative inference,”’ 
whose object is to recognize and report in appropriate objective terms those 
features of experimental outcomes which constitute statistical evidence relevant 
to hypotheses (or parameter values) of interest. The mathematical structure of 
statistical evidence, and its qualitative and quantitative properties, are analyzed 
by application of (1) the mathematical results of Part A, which show that condi- 
tional experimental frames of reference (in the mixture sense) exist and are 
recognizable much more widely than has previously been realized; and (2) a 
single extra-mathematical proposition which many statisticians seem inclined to 
accept as appropriate for purposes of informative inference, a ‘“‘principle of 
conditionality” which asserts that any outcome of any experiment which is a 
mixture of component experiments should be interpreted in the same way as if 
it were an outcome of just a corresponding component experiment (with the 
overall mixture structure otherwise ignored). This analysis establishes the likeli- 
hood function as the appropriate basis from which statistical inferences can be 
made directly without other reference to the structure of an experiment. For the 
numerical values of the likelihood function, this analysis provides direct inter- 
pretations in terms of probabilities of errors. These probabilities admit frequency 
interpretations of the usual kind, but they are not in general defined with refer- 
ence to the specific experiment from which an outcome is obtained: they express 
intrinsic objective properties of the likelihood function itself, which this analysis 
shows to be appropriately relevant and directly useful for purposes of informa- 
tive inference. The relations of this analysis of problems of informative inference 
to problems of testing statistical hypotheses, decision-making, conclusions, and 
Bayesian treatments of inference problems are discussed briefly. 


Received July 30, 1960; revised October 11, 1960. 
1 Prepared under the sponsorship of the Office of Naval Research, United States Navy, 
Contract No. Nonr-285 (38). 


414 





FOUNDATIONS: BINARY EXPERIMENTS 415 


Generalizations of these mathematical results and their interpretations for 
problems involving more than two simple hypotheses will be given in a following 
paper. 


A. MATHEMATICAL DEVELOPMENTS 


1. The canonical form of a binary experiment. We consider a given experi- 
ment E, assuming that questions of experimental design, including those of 
choice of a sample size or possibly a sequential sampling rule, have been dealt 
with, and that the sample space of possible outcomes x of £ is a specified set 
S = {x}. We assume that each of the possible distributions of X is represented 
by a specified elementary probability function f;(z): if the hypothesis H;, is true, 
the probability that # yields an outcome z in A is 


(1.1) P(A) = [ 52) du(z), 


where yu is a specified o-finite measure on S, and A is any measurable set. We 
assume until otherwise stated that there are only two possible distributions, so 
that i = 1 or 2. Such experiments will be termed binary experiments. 

Discussions of statistical inference problems concerning binary experiments 
usually specify at the outset that the problem under consideration is that of 
testing the simple hypothesis H, against the simple alternative H; , or that of 
making one of two specified decisions, on the basis of an observed value of X. 
These discussions seem to assume tacitly that such formulations are the only 
ones of possible interest, or at least the only ones sufficiently definite to allow 
satisfactory theoretical treatment and objective practical application. (We do 
not consider here formulations in which it is assumed that there exist probabilities 
of the hypotheses themselves, Prob (H;), i = 1, 2, in some sense.) We begin 
however with a less formal but broader specification: the general goal is to make 
inferences from an observed value of X to the hypotheses. Our purpose is to 
show that this broader specification suffices to guide a useful analysis of the 
mathematical structure of any given experiment FZ, an analysis which exhibits 
some new mathematical properties of experiments that are of intrinsic interest 
and relevance for statistical inference in general, and throws some new light on 
more specialized formulations of inference problems. 

For any given binary experiment £, let 


(1.2) r= r(x) = log [fe(x)/fi(x)). 
It is well known that r is a (minimal) sufficient statistic. Let 


(1.3) F(r) = Prob [r(X) Ss r| Hj, ¢ = 1.2. 


’ 


In general r(X ) is a generalized random variable in the sense that it may assume 
infinite values with positive probability under one or both hypotheses; corre- 
spondingly, in general F; and F, are generalized cumulative distribution func- 





416 ALLAN BIRNBAUM 


tions (c.d.f’s.). The pair of distributions F, , F; of r may be taken as a canonical 
form of any binary experiment LF. 


A canonical form which is more convenient for many purposes is obtained as 
follows: Let 


(1.4) u(r,2z) = 2F\(r) + (1 — z)Fi(r—), 


for0 <zS land—-« Srs ~o, If Z is an auxiliary randomization variable, 
that is, a random variable having under each hypothesis the same uniform 
distribution on the unit interval, 0 < Z S 1, independent of X, then U = 
u(r(X), Z) may be called the continuous probability integral transform of 
R = r(X), since 


(1.5) Prob (u(R, Z) S u| Hi) = u, for OSusl. 


Since r is a function of u(r, z), the latter is a sufficient statistic. For each u, let 
v(u) = Prob [u(R, Z) S u| Hj, 0 S u S 1. The function (c.d.f.) v(u) may 
be regarded as the canonical form of the given binary experiment E as was 
pointed out in [1]. (For each u, by the fundamental lemma of Neyman and 
Pearson a best test of size 1 — wis one which rejects H, when u(r, z) exceeds u; 
with this test, the probability of a Type II error is v(u). The latter is well known 
to a convex function of wu.) Since v(u) is convex, it is continuous, except possibly 
at u = 1, where v(1) = 1 always. 

Conversely, each convex c.d.f. v(w) on the closed unit interval is the canonical 
form of some binary experiment. For if v(«) is convex and v(0) = 0, v(1) = 1, 
let fe(u) denote v’(u), the right derivative of v(u), for each u < 1, and let 
fe(1) = ©. Let fi(u) = 1,0 S u S 1. Then the binary experiment FE represented 
by the elementary probability functions f,(w), fo(u) (with respect to Lebesgue 
measure ) has the canonical form v(w), as is readily verified. 

It is often convenient to consider a binary experiment as represented by the 
graph of its “v(u) curve,” with the latter supplemented by a vertical line-seg- 


ment if necessary so as to give in all cases a graphically-continuous convex curve 
from (0, 0) to (1, 1). 


2. Simple binary experiments. A binary experiment with v(u) = uw is trivial 
in the sense that its sufficient statistic r = r(x) has the same distribution under 
each hypothesis. Such experiments will be called uninformative, and all other 
experiments will be called informative. 

A binary experiment will be called simple if its sufficient statistic r assumes at 
most two distinct values, r; S r2, (with exceptions on sets of points z having 
probability 0 under each hypothesis). A binary experiment which is not simple 
will be called composite. In an informative simple binary experiment, we have 
r; < r, each value having positive probability under at least one hypothesis. 
In any such experiment, let 


(2.1) pi = Prob [r(X) = r.| Hi), and qi =1-— pi, for ¢ = 1, 2. 





FOUNDATIONS: BINARY EXPERIMENTS 417 


Then 0 S p.: < pp S l,orO S @ < mS 1; the point (q , @) characterizes any 
such experiment, since its v(u) curve consists of two line segments connecting 
successively the points (0,0), (q , @), (1, 1). 

Conversely, every such v(u) curve, or every point (q, @) withO S @ < 
q & 1, characterizes an informative simple binary experiment. For consider any 
such pair and the experiment £ consisting of a single Bernoulli trial such that 


(2.2) qi = Prob [X = 0| Hj, and 
pi = 1 — qy = Prob[X = 1| Ad, ¢ = 1, 2. 


Its sufficient statistic is 


f 


| ry = log (¢@2/q) if z=0, 
(2.3) r(x) = iene 


| re = log (p2/pr) if z=1l. 


Any such experiment may be characterized alternatively by a point (7m, 12) 
satisfying —© S r; < 0 < mS ©, that is by a point in the second quadrant 
of the (7; , 72)-plane excluding the coordinate axes but including all points with 
one or both coordinates infinite. 

A third representation of any informative simple binary experiment is given 
by the ordered pair (L, , L.) of possible values of the likelihood ratio statistic: 


(2.4) I, = &/n = e"', Ie = o/s =e”, 0sh<1<L8 »%, 


so that q, = (Le — 1)/(Le — 1) and @ = L,q. A fourth representation is given 
by considering the only nontrivial nonrandomized best test of H; against Hz , 
which rejects H, just when r(x) = re ; the probabilities of errors of Types I and 
II respectively are (a, 8) = (pi, qe), which satisfy a + 8 < 1. A fifth useful 
representation of any such experiment is by means of a stochastic matrix: 


(2.5) E=(% - 
q2 pre 


An uninformative simple binary experiment is represented by (r, , r2) = (0,0), 
or by (11, Le) = (1,1), or by (qm, @) = (m1, H) for any gq , or by (a, 8) = 
(a, 1 — a) for any a. 

EXAMPLE 1. 

“One toss of a coin’ experiments. As indicated above, every simple binary 
experiment is equivalent to an experiment consisting of a single observation on 
a Bernoulli random variable X with possible values 0 or 1 only. 

EXAMPLE 2. 

A Wald sequential probability ratio test between two simple hypotheses, in 
special cases including certain tests on a binomial parameter (the cases in 
which there is “‘no excess at termination’), is based on a sequential sam- 
pling rule which allows only two values for the likelihood ratio statistic, or 
for r(x). In many other cases, such tests might be called approximately simple 





418 ALLAN BIRNBAUM 


in the sense that under each hypothesis the probability of r(X) = r, or 7 is 
very near unity. 

EXAMPLE 3. 

Communication channels. In communication theory (information theory), a 
communication channel (without memory) is any structure which can receive 
at one point any one of a specified set of “input signals” and deliver at another 
point one of a designated set of “‘output signals’’, the respective probabilities of 
the latter depending only upon the selected input signal. In the case of just two 
input signals, which we may denote by H, , Hz, we have a binary channel; we 
may denote the set of possible output signals by S = {2}, and the respective 
probabilities of subsets A of S by P;(A),7 = 1, 2. Thus each such communication 
channel is mathematically equivalent to a binary experiment, and conversely. 
If z = O or 1 only, we have a simple binary (‘“‘two-by-two”’) channel, equivalent 
to a simple binary experiment. Here (a, 8) describe completely the structure of 
“noise” in the channel: @ is the probability that transmission of H, will lead to 
receiving of x = 1, and 8 is the probability that transmission of H, will lead to 
receiving of x = 0. 

Noisy channels in series. It is convenient to introduce some techniques re- 
quired below as an elaboration of the present example. Let channel FE have 
inputs H, , H, , outputs x = 0 or 1, and noise parameters (a, 8). Let channel EF’ 
have inputs x = 0 or 1, outputs x’ = 0 or 1, and noise parameters (a’, 6’). Then 
the channel E* consisting of FE followed by EF’ has inputs H, , H2 , and outputs 
xz’ = Oor 1. It is useful to write E* = EE’, since if 


(2.6) E= @ _ and £’ = (2 ) , 
q2 pe 2 pe 


then 


pea (™ ”)(% 2) — pp 
q2 P2/\Q2 Pe 
a q + ~r 2 qnP + pr i) ra (% 


g29i + P2G2 Gopi + Po Po q2 


2 


(2.7) 


The noise parameters of E* are 
(a*, B*) = (pt , g) 
= ((1 — aja’ + a(l — 8’), BCL — a’) + (1 — Bde’). 


The other representations of E* include 


(2.8) 


Lt = @t/qt = (qa + poge)/ (qa + 192); 
(2.9) 


= pr/pi = (qepi + pop2)/(qpi + pipe). 





FOUNDATIONS: BINARY EXPERIMENTS 419 


If g = 0 but p; > 0, we may say that E’ has noise affecting only the transmitted 
signal x = 0; in this case we may also say that EL’ has noise which degrades only 
the received signal x’ = 1, since the received signal x’ = 0 is known with certainty 
to follow from a transmitted signal x = 0, while a received signal x’ = 1 is 
known to be possible following either transmitted signal x = 0 or 1. In such a 
case we have Li = q@/q = 1, and 


(2.10) L: = (p+ pig)/(pi + pig) < po/~p = Le 


(assuming p, < p:, the remaining case being trivial). Similarly if p; = 0 but 
q: > 0, E’ has noise affecting only x = 1 and degrading only zx’ = 0, and L3 = 
po/pi. = In, 


(2.11) Li = (@ + @p)/(q@ + Gr) > &/a = Lh 


(assuming the nontrivial case p,; < p2). It is easily verified that every channel FE’ 
is equivalent to a pair of channels in series, EH = E, E,, where EF, has noise 
affecting at most the signal z = 1, and EF, has noise affecting at most the signal 
x= 0. 

It follows that for any simple binary channels #, with parameters (ZL, , Le), 
and E’, the channel E* = EE’ has parameters (Li , L2) satisfying L, < Li < 


1 < L? < Ly. And conversely, if E and E* are channels with parameters satisfy- 
ing these inequalities, then there exists a channel E’ such that E* = EE’. Since 
r; = log L,; , these inequalities may be written 


(2.12) n=ns0snEn. 


EXAMPLE 4. 

Significance Tests. In every binary experiment, if the outcome z is to be re- 
ported only by a conclusion of the form “reject H,’’ or ‘accept H,’”’ based on a 
specified significance test with error-probabilities (a, 8), then the over-all pro- 
cedure is formally a simple binary experiment, with ZL, = 8/(1 — a), lx = 


(1 — B)/a. 


3. The partial ordering of binary experiments. In the theory of comparison 
of experiments [2], an experiment E is called at least as informative as another 
experiment E* if and only if it is possible to use E, possibly supplemented by use 
of an auxiliary randomization variable, to construct an experiment equivalent 
to E*. (We depart from the usual terminology, in which “more informative 
than” is used so as to include the case of equivalence. ) 

To denote that £ is at least as informative as E*, we write E = E* or E* S E. 
It is also convenient to denote this relation by writing that E contains E*, since 
this terminology has been used in connection with communication channels [3]. 

If E = E* and E* = E, we write E = E* to denote that E is equivalent to E*. 
We write E ~ E* to denote that E and E* are not equivalent. If EH = E* and 
E # E*, we write E > E* to denote that E is more informative than E*. If neither 
E = E* nor E* 2 E holds, E and E* are not comparable. 





420 ALLAN BIRNBAUM 


It is well known that, for binary experiments EZ: v(u) and E*: v*(u), we have 
E = E* if and only if v(u) S v*(u) forO S u S 1. In the case of simple binary 
experiments E: (r; , 72) and E*: (rf , rz), it is readily verified that this condition 
specilizes to: EF = E* if and only ifr, S nansn ; that is, if and only if the 
interval (r;, r2) contains the interval (rf , rz). 

The partial ordering of simple binary experiments determined by the relation 
= is conveniently represented graphically in the (7; , 72) plane. E > E* denotes 
that (ri , r2) is closer than (r;, r2) to (0, 0) in the sense that at least one of its 
coordinates is closer to 0 and neither is farther. In a case of non-comparability, 
one of the points (r;, r2), (ri , 72) lies to the upper-right of the other. 

Any finite or infinite set of experiments will be called strictly ordered if, of 
every pair in the set, one is more informative. Each such set of experiments 
corresponds to a subset of the points (r;, rz) of some graphically-continuous 
nonincreasing curve from (— «, «) to (0, 0). Any such set of experiments has 
a paramatric representation (7r,{d], re[d]), with r,{d] nondecreasing and r,[d] non- 
increasing in d, where d has a specified range. 


4. Mixtures of simple binary experiments. If various experiments are possible 
for a given inference problem, and if one of these is selected for use by means 
of a specified random device unrelated to the hypotheses, the over-all procedure 
is called a mixture of experiments, or a mixture experiment. Since each simple 
binary experiment is represented by a point (7, , r2) in the range described above, 
the various (generalized) cumulative distribution functions G(7, rz.) on that 
range correspond to the possible mixtures of simple binary experiments. For any 
such distribution G, we write FH, to designate the (mixture) experiment con- 
sisting of the selection of a simple experiment (7; , r2) by use of a random device 
corresponding to G, and the observation of the outcome of one trial of the 
selected experiment; the simple experiments will be called components of Eg. 

Any such mixture experiment F, has the generic sample point z = (1, r2, 7s), 
where (7; , 72) is the selected simple experiment and r; is the observed outcome 
of that experiment, r; = 7, or r2. To determine the sufficient statistic r(x) = 
r(r,, 72,173) of such a mixture experiment, let f;(71 , r2 , 73) denote the probability 
or probability density of (71 , re, 73) if H; is true, 7 = 1, 2. 

The conditional distributions of R; , given (R,, Re) = (n1, 72), are 


Prob[R; = 7: | (1, 2), Hi = qi; andif m>n, 
4.1) 


Prob [Rs = r2| (nn, 72), Hi] = pi = 1 = Ts t= 1, 2, 


where qi = qi(11 , T2) are determined as above by 7; = log (q@/q:), 72 = log(pe2/p1). 
If r; = r2 = 0, then R; = 0, and we may take p; = p, = 1. Hence the marginal 
probability or probability density of (1: , r2) is 


; ( f(0, 0, 0,), if 1 = Th = 0, 
(4.2 fri, 72) = § 
\filri, re, ridai t+ filri, re, 72) D:, if m<re, 





FOUNDATIONS: BINARY EXPERIMENTS 421 
fori = 1, 2. However fi(r: , m2) = feo(ri , T2)(a.e., H; and H2), since the distribu- 
tion of G of (R,; , Rz) is independent of the hypotheses. Hence we can write 

f,(0, 0) if n=n=r, = 0, 
(4.3) Silti, t2, 7s) = Sfilri, 72) if r=ni<t, 
firs , r2) Di if m=n>n, 
fori = 1, 2. Hence the sufficient statistic of Eg , an arbitrary mixture G(r; , rz) 
of simple binary experiments (1; , 72), is 
(4.4) r(x) = r(r, 72, 7s) = log [fe(ri, r2, 7s) /filti, 72, T2)] = Ps. 


EXAMPLE 1. 
Binomial mean. Consider the five simple binary experiments Ey, E,, --- Ey 
defined by the respective pairs of parameters (LZ, , LZ.) given in Table I below. 


TABLE 1 
Some simple binary experiments 





Experiment (fi , D2) (Li, L2) ; (a, 8) 





| (1, 1) (.5, .5) 

.9412) (1/16, 16) ' (.0588, .0588) 
0039, -9961) I (1/256, 256) (.0039, .0039) 
0037, .9377) (1/16, 256) (.0037, .0623) 
0623, .9963) (1/256, 16) (.0623, .0037) 


(.5 
(.0588 
(.00 
(. 
( .0€ 


Some distributions defining mixtures of the above experiments 





G ‘ G : G,0s¢s1 
(2) (.2)2(.8)? = .1536 : 
(1) (.2)(.8)* + (3)(.2)9(.8) = .4352 ; 
($)(.8)* + (4)(.2)4 = .4112 
0 
0 


go 
91 
92 
gs 
gs 


go 
(1 — c)gi 
was C)g2 
C93 

i 
C94 


go)/2 = .4232 


uuu da 
uu au 
ouduod 


The table gives also the parameters (p; , p2) and (a, 8) of these experiments to 
four decimal accuracy. The table also gives a number of discrete distributions 
G® = G(r, 72): for each c, 0 S c S 1, a mixture experiment EF. is defined by 
the five probabilities gj = Prob (£;),7 = 0,1, --- 4. It is convenient to use the 
notation goo ® gil; ® --: ® gE, to denote the operation of mixing the experi- 
ments Ey, --- E, with respective probabilities go, --- gs. We can then write, 
foreachc,0 Sc S31, 


4 
(4.5) Eee = >. @ giE; 
t=) 





422 ALLAN BIRNBAUM 


Consider next the binomial experiment EF, consisting of four observations, 
with parameter 6 = .2 or 8: 


(4.6) fi(x) = ({\ca9), f(x) = ({).s% ™, 2«£=0,1,---4. 
The following assertion can be verified by simple direct,calculations: The mixture 
experiments Eg defined above are equivalent to one another, and each is equiva- 
lent to Ez. That is, Hyg = Eg for each c, 0 S ¢ S 1. The v(u) curve of Ez is 
sasily determined from the given binomial distributions f;(z), and consists of 
the line segments between the successive points (given to four-decimal accuracy) : 
(0, 0), (.4096, .0016), (.8192, .0272), (.9728, .1808), (.9984, .5904), and (1, 1). 
It may be noted that only one of the above distributions G* represents a mixture 
of strictly ordered simple binary experiments, namely G° = G. 

EXAMPLE 2. 

Normal mean. The symmetric simple binary experiments (r, , 72) are those for 
which r, = —r2. Any mixture G(r; , 72) over this strictly ordered class of experi- 
ments can be represented conveniently by the marginal c.d.f. of R. under G, 
which we denote by G(r). Let 
(4.7) G(rz) = ®(r. — 4) — O(—r — 3), for OS re 0 


> 


where ®(u) = f“.¢(u) du and ¢(u) = (2x) exp (—4u’). Then 
(4.8) G(r2) = | g(y) dy, 
0 


where 


g(y) = o(y — 4) + o(—y — 3). 


Under hypothesis H; , the sufficient statistic r; of the mixture experiment Eg 
has the density function 


(g(re)qi( —r2 » T2) if rs 0, 


< 
(4.9) firs) = 4 g(r) pi —r12 , 12) if r,; > 0, 


| g(r2) if fs 0 


bd 


where r2 = |rs|, qi(—t2, 72) = (e* — 1)/(e* — €), qe(—re, re) = 
e *q(—re, 72), and p;(—re, r2) = 1 — gi(—r2, r2), fori = 1, 2. Upon simplifica- 
tion we find that fi(rs) = (rs + 4), fe(rs) = (7s — 4); thus the sufficient 
statistic r; has under each hypothesis a normal distribution with unit variance, 
with respective means — } and }. 

Consider next the experiment Ey consisting of a single observation on a 
normally distributed random variable X, having unit variance and, under the 
respective hypotheses, means — 3} and 3. It is well known that for this experi- 
ment the sufficient statistic is r(x) = x, which has under the respective hypothe- 
ses the same (normal) distributions found in the above mixture experiment Eg 





FOUNDATIONS: BINARY EXPERIMENTS 423 


for its sufficient statistic r; . It follows that the two experiments are equivalent: 
En == Eg . 


5. Decomposition theorem for binary experiments. In the preceding examples, 
two binary experiments typical of those treated in mathematical statistics were 
shown to be mathematically equivalent to certain mixtures of specified simple 
binary experiments. The following theorem shows that every binary experiment 
can be decomposed in this sense into simple components. 

THEOREM. Each binary experiment is equivalent to a mixture of strictly ordered 
simple binary experiments. 

PRoor: 

1. Let v(u) be an arbitrary convex c.d.f. on the closed unit interval, v(0) = 0, 
v(1) = 1, representing as above any given binary experiment EZ. E has the 
sufficient statistic u with distributions 


Prob {U su|M} =ue= | du, Prob {U Ss u| H2} = v(u), 


» = 


and for u < 1,v(u) = fo fe(u)du, where fo(u) = v’(u) is the right-derivative of 
v(u). 

Let h(u) = u — v(u) and h* = sup{h(u) |0 S u S 1}. We have h* > 0, 
except in the case v(u) = u, 0 S u S 1, which is the uninformative experiment 
(7; , 72) = (0,0) for which the conclusion of the theorem holds trivially. Assum- 
ing h* > 0, the function h(u) is concave, h(0) = h(1) = 0, h(u) > O for 0 < 
u <1; h(u%) is continuous, except possibly at u = 1 corresponding to a possible 
discontinuity of v(w) at uw = 1. If v(x) is discontinuous at u = 1, we define h(1) 
as multiple-valued, having all values in the closed interval [1 — v(1—), 1]; 
then in all cases h(u) is a graphically-continuous concave curve on the closed 
unit interval. The right-derivative of h(u) is h’(u) = 1 — v’(u), foru < 1. 

For each h, 0 S h < h*, the equation h(u) = h has two distinct roots which 
we designate m4(h) < uw(h). The equation h(u) = h* is satisfied on a closed 
interval or at a single point u, which we designate by m(h*) S u S w(h*), 
m(h*) < w(h*). wm(h) is continuous, convex, and strictly increasing in h, 
0 < A S h*. w(h) is continuous, concave, and nonincreasing; it is strictly de- 
creasing in h, for 1 — v(1—) S A S h* (that is, forO S h S hA*, unless v(x) is 
discontinuous), and w(h) = 1 for 1 — v(l—) SA S h*. Let u;(h) denote 
the respective right-derivatives of u;(h), forO0 S h < h*; then 


(5.2) uj(h) = [1 —fo(us(h))J for OSh < ht, i= 1,2. 


Corresponding to each h, 0 S h < h*, we define the simple binary experiment 


(5.3) Ey: (nilh], refh]) = (log fo(m(h)), log fo(ue(h))). 


Corresponding to h = h*, we take (rifh*], re[h*]) = (0, 0). These experiments 
are clearly strictly ordered. 





424 ALLAN BIRNBAUM 


Let 
[} — (w(h) — w(h)), for Osh < h*, 


(5.4) G(h) = * 
1, for h=h'. 


Let g(h) = ui(h) — w(h), for 0 < h < h*. Then G(h) = fig(h)dh for 0 < 
h < h*. 

2. We define the experiment E, as the mixture G = G(h) of the strictly ordered 
simple binary experiments EF, : (r;{h], re{h]), 0 S h S h*. We proceed to prove 
that E = Eg, by proving that v(u) = ve(u), 0 S u S 1, where ve(u) is the 
canonical form of Eg. 

For each h < h*, the simple experiment E, : (r,{h], re{h]) is equivalent to an 
experiment consisting of one observation on the random variable U; having the 
following distributions: 


Prob {U, = m(h) | Hj = qi(h) 
Prob {U, = wu(h) | Hj = ph) = 1 — gi(h), 


(= - 


(5.5) 


where q;(h),7 = 1, 2, are determined by 


rifh] = log [q2(h)/q(h)],  —ralh] = log [p2(h)/p.(h)). 


For h = h*, the experiment (r,[h*], ro[h*]) = (0, 0) is equivalent to the trivial 
experiment consisting of one observation on the random variable U;- which has, 
under H, and H; , the same uniform distribution on the interval [w(h*), we(h*))]. 
Let Ex be the experiment in which one observation hf is taken on an auxiliary 
randomization variable H with the c.d.f. G(h) defined above, independent of 
the hypotheses, followed by one observation on the corresponding random 
variable U; whose distributions under H, , H2 , were given above. Each possible 
outcome of this mixture experiment has the form (h, u,) where h is the observed 
value of H and w is the observed value of U;, . Clearly Ex = Eg. 

For different values of h, the ranges of U, are disjoint; hence the observed 
value h is a function of the observed value wu, , and the latter is a sufficient statis- 
tic for Eg . The distributions of the statistic u, are those of the random variable 
Uz , which are determined as follows: Let W,(u) = Prob {U, S u|Hj,0 Ss 
us 1,7 = 1, 2. We have W,(1) = 1; and since Prob {H = 0} = G(0) = 0, 
W,0) = 0, fori = 1,2. For0 < u < u(h*), we have 


(5.6) Wu) = [ w;(u) du, 
0 


where 
g(h(u))qs(h(u)) /ur(h(u) ) 
[ui (h(2)) — un(h(u))]qi(h(u) ) ‘u;(h(u)). 





FOUNDATIONS: BINARY EXPERIMENTS 


Hence 


w,(u) = [l — us(h(u))/u(h(u) )] 


= 


(5.7) 
ng [fal tig(h(tu)) — 1/Lfalrea(h(u))) — fa(rn(h(u)))]. 


We have ™(h(u)) = u, and for brevity we write here uw, for w(h(u)), for 0 < 
u < um(h*). Thus 


wi(u) = (1 — [Ll — fo(u)]/[L — fa(ue)]) 

[fo(ue) — 1)/[fe(ue) — fo(u)] = 1. 
Since @(h(u)) = fe(u)q(h(u)), we have, forO0 < u < m(h*) 
(5.9) we(u) = fo(u). 


In the same way the same formulae for w;(u) can be verified for the range 
w(h*) < u < 1. If Prob {H = Ah} = 0, w(h*) = w(h*), and 
Prob {Ug = m(h*)| Hj} = 0 fori = 1, 2. If Prob {H = h*} > 0, wm(h*) < 
u2(h*), and by definition we have, for m(h*) S u S w(h*), w,(u) = w.(u) = 
fo(u) = 1. Thus v¢g(u) = fo fe(u)du = v(u) forO S u < 1, and ve(1) = v(1) = 
1, completing the proof that Eg = E. 


(5.8) 


? 


B. INFERENCE METHODS WITH PROBABILISTIC JUSTIFICATIONS. 


6. On the mathematical treatment of statistical inference problems. It is 
usual in modern mathematical statistics to restrict consideration to inference 
problems formulated on the basis of specified statistical experiments E in which 
the possible probability distributions of outcomes are described and delimited. 
(This includes problems of experimental design, which concern the appraisal 
and comparison of alternative possible experiments.) Moreover, it is now usual 
to consider such a specified statistical experiment to be the essential and basic 
frame of reference in which the relevant properties of any inference techniques 
must be defined and interpreted; for example, the basic properties of techniques 
of testing statistical hypotheses, and of related estimation techniques, are 
various error-probabilities, each defined directly as a probability in a specified 
experiment EH, and interpreted in terms of relative frequencies of errors in con- 
ceptually-possible repetitions of #. Inference problems and techniques as they 
may be discussed outside such frames of reference are usually considered vague, 
and lacking in objectivity and usefulness. 

The preceding sections have treated the mathematical structure of statistical 
experiments E in the binary case, and have left aside the remaining aspects of an 
inference situation, which include 

(a) the conclusions or decisions among which a choice must be made on the 
basis of an observed outcome z of experiment EF; 

(b) the consequences of each possible choice, on the respective assumptions 
that each of the simple hypotheses is true; and 

(c) the evaluations of such consequences by the individual in the inference 





426 ALLAN BIRNBAUM 


situation; his purposes; and possibly his prior opinions or information concern- 
ing the hypotheses. 

The specification of these additional aspects of an inference situation in 
appropriate and formal terms is often difficult or problematical, even when all 
of the general features of the inference situation are quite clear. 

If at least aspect (a) can be specified definitely, as for example that just two 
conclusions or decisions are allowed, then it is possible to give an analysis of the 
inference problem having general usefulness in connection with various formal 
or informal specifications of the remaining aspects (b) and (c). 


7. Tests of statistical hypotheses; two-decision problems. If it is specified 
that one of just two conclusions or decisions must be adopted on the basis of an 
outcome of E, with specified v(u), we may denote by d, that conclusion or 
decision which would be more appropriate if H, were true, and by d, the alterna- 
tive, which may be called “reject H, .’”’ Then each (Lebesgue measurable) 
function d = d(u), taking values d, or d, only, represents a possible inference 
rule, whose relevant properties are the error-probabilities 


= Prob (d(U) = d| Mi), 
(7.1) 
Prob (d(U) = d, | He). 


Foreacha,0 S a S 1, letd,(u) = difu2=1—a;letd.(u) =difu<l—a. 
r 


Then the error-probabilities of d.(u) are a and 


(7.2 B = Bla) (vu({l1—al), for 0O<aSl, 
(.2) = a = 4 
\o(1—), for a=0. 


Since the likelihood ratio statistic of E is v’(u), a non-decreasing function of u, 
we have by the fundamental lemma of Neyman and Pearson that d,(u) is a 
best test of H, against H, of significance level a. 

Let a’ = min fa | B(a) = 0] = 1 — max [w|v(u) = 0]. The inference func- 
tions d.(u), 0 S @ S a’, constitute a minimal essentially complete class of 
(admissible) inference functions. For the problem considered, on the basis of 
the given experiment E, no other inference functions need be given considera- 
tion; but no further analysis or simplification of the problem of choosing one of 
these inference functions can be given except in relation to formal or informal 


specifications of the aspects (b) and (c) of the inference situation referred to in 
the preceding Section. 


8. Multi-decision problems; tests based on critical levels. To illustrate most 
simply that even with a binary experiment it is sometimes appropriate to allow 
more than two possible decisions (or conclusions), consider the case in which 
three decisions may be allowed. Assume that decision d; would be the most 
appropriate of the three possibilities, and that d, would be the least appropriate, 





FOUNDATIONS: BINARY EXPERIMENTS 427 


if H, were true; and that d,; would be least appropriate, and d, most appropriate, 
if H, were true; the remaining decision, d; , is then more appropriate than de if 
H, is true, and more appropriate than d, if H: is true. An example would be a 
situation of industrial acceptance sampling in which it is assumed that each lot 
of items contains either a certain small proportion of defective items (Hj) or a 
certain higher proportion of defective items (H.); and the possible classifica- 
tions are: d,, “apparently high quality”; or d., “apparently low quality”; or 
d; , “indeterminate quality”. Another type of example is represented by desig- 
nating d, as the conclusion “reject Hz (in favor of H,),” and d, as the conclusion 
“reject H, (in favor of H,),’’ and d; as the conclusion “reject neither hypothesis” 
or “no conclusion.” 

Any inference procedure here can be represented by some function d(x), 
defined on the unit interval, taking values d, , d, or d; . The relevant properties 
of any such function are just the four error-probabilities a; , 8; ,7 = 1, 2, where 


= Prob [d(U) = d,| H,| = the probability of a ‘‘major Type I error,” 


Prob [d(U) = d;|H,| = the probability of a “minor Type I 
error,” 


Prob [d(U) = d,| H:| = the probability of a “major Type II 
error,” and 


Prob [d(U) = d;| H2] = the probability of a “minor Type II 
error.” 


Clearly the general goal, in appraising and selecting an inference function based 
on a given binary experiment, is that each of these error-probabilities should be 
suitably small. If the function 6(a) is defined as above, then for any values of 
a, and a2 such that 0 S a; + a2 S a’ (no other cases should be considered), we 
have (by the Neyman-Pearson lemma) that the smallest possible value of 8; is 
B(a, + a2), and the smallest possible value of 8 is B(a,) — B(a; + ae); and that 
these are the error-probabilities of the admissible three-decision function: 


d,, if u<1—a,— a, 


(8.2) d(u) = ds, if l—-a—-asu<l—a, 
de, if 1 — aq > U. 


Comments like those of the preceding Section apply to the problem of choice of 
a particular inference function of this form. Any inference or decision function 
of this form has the probabilistic justification that its four error-probabilities are 
‘jointly minimum”’ in the sense that no one of them could be reduced except by 
an increase in one or more of the others. The policy of using such an inference or 
decision function, having suitably small error-probabilities, is thereby justified 
in the sense that in many independent applications, under respective hypotheses, 
the relative frequencies of the more and less serious errors of various kinds will 
tend to be correspondingly small. 





428 ALLAN BIRNBAUM 


The preceding discussion can be immediately generalized to allow any number 
of possible decisions or conclusions, simply ordered according to their decreasing 
appropriateness if H, is true (and increasing appropriateness if H2 is true); an 
infinite number (not necessarily countable) can be allowed. In all such cases, 
the admissible inference or decision functions, having probalistic justifications 
of the kind illustrated above, will have a form in which larger values of the out- 
come u tend to indicate conclusions or decisions which are more appropriate 
when H,; is true. 

An inference technique which antedates modern mathematical statistics, and 
which remains in wide use, is that based on the critical level associated with an 
observed outcome: When an appropriate statistic has been selected, for example 
the statistic u, the critical level is defined as the probability, under a hypothesis 
H, being tested, of a value of U at least as large as the value observed: 


(8.3) a(u) = Prob [U 2 u| Ajj. 


Observed values of a(u) more or less close to 0 are customarily interpreted as 
representing more or less strong evidence for rejection of H, ; one convention of 
interpretation, which is clearly rather schematic, applies the term “significant”’ 
to outcomes a(u) S .05, and the term “highly significant” to outcomes a(u) S 
.01. Leaving aside interpretations which ascribe to a numerical value of a(u) some 
intrinsic meaning as a quantitative measure of strength of evidence against H; 
in an outcome u, there remains the qualitative simple ordering of conclusions 
with those favoring H2 more strongly corresponding to smaller values of a(u). 

This latter qualitative part of the customary interpretation of various possible 
values of the critical level, considered in the context of a specified experiment, 
has the kind of probabilistic and frequency justification described above. In 
addition, the numerical values of a(u) have probabilistic interpretations related 
to various errors of Type I; for example, any interpretation of outcomes a(u) < 
.01 as “strong evidence against H,’’ will be highly inappropriate if H, is true, 
but will be made with probability only .01 when H;, is true. However techniques 
based upon critical values do not incorporate systematic consideration of error- 
probabilities under H:2 . 

While the theory of Neyman and Pearson introduced the essential comple- 
mentary concept of errors of Type II, the formal development and the applica- 
tions of this theory have typically been based on fixed-level formulations, and 
have typically treated only two-decision problems. The preceding discussion 
shows that a simple adaptation of the standard fixed-level theory and methods 
gives multi-decision and corresponding inference methods which have the 
flexibility and intuitive appeal of the traditional critical level technique, and 
also an appropriately complete objective probabilistic appraisal and justification 
based on consideration of probabilities of errors of all kinds and degrees, in the 
context of a specified statistical experiment. 





FOUNDATIONS: BINARY EXPERIMENTS 429 


C. INFERENCE METHODS WITH INTRINSIC JUSTIFICATIONS. 


9. Informative inference. A traditional and basic type of application of tech- 
niques of mathematical statistics, including techniques described in the preced- 
ing two sections, occurs in situations of empirical scientific research. In such 
situations, besides problems of inference or decision-making which bear upon 
specific practical purposes, or specific research purposes such as drawing working 
conclusions and planning further research, a broader inference problem is often 
recognized and dealt with. The latter problem is that of recognizing, appraising, 
and sometimes reporting in the scientific or technical literature, in appropriate 
objective terms, the general character of experimental results as they are relevant 
to statistical hypotheses (or values of unknown parameters) of interest. This 
problem may be described as that of recognizing, and reporting appropriately, 
statistical evidence relevant to statistical hypotheses of interest. For brevity, we 
use the term informative inference to refer to this problem and to methods for 
dealing with it. 

In typical research situations, when a test of a statistical hypothesis (appropri- 
ately valid and efficient) indicates rejection of that hypothesis, besides the 
conclusions or decisions which the experimenter may reach it is often recognized 
that the experimental results may be of more general interest and value; and a 
description of the testing procedure and its outcome are often reported to indicate 
in objective terms the character of the results as evidence relevant to hypotheses. 
The reporting of estimates of parameters of interest with indicators of their 
precisions, in the scientific literature, typically serves the same broad and basic 
scientific function. In this function, the methods of mathematical statistics 
serve as techniques for the evidential interpretation of experimental outcomes. 

The basic terms of such interpretations are usually taken to be certain error- 
probabilities associated with the testing or estimation techniques used. (The 
precision of an estimator can typically be interpreted in terms of probabilities 
of estimation-errors of various magnitudes.) In fact the general nature of statis- 
tical evidence, relevant to hypotheses of interest, is commonly recognized, 
expressed, and dealt with, in a generally clear and effective way, in terms of 
such error-probabilitie.. Our purpose in the following sections is to clarify the 
mathematical structure of statistical evidence and the terms appropriate for its 
description. 


10. Symmetric simple binary experiments. It is convenient to refer to the 
outcome r; of any simple binary experiment (7; , 72) as “positive,” and to the 
outcome r, as “negative.’’ A simple binary experiment will be called symmetric 
if r, = —r, that is, if the experiment is of the form (—rez, r2); in the present 
section we consider only experiments of this form. Each such experiment is 
characterized by a number r,,0 S rz S ©. This class of experiments is simply 
ordered, by the parameter r, , according to the relation “more informative than” 
defined in Section 3 above. 





430 ALLAN BIRNBAUM 


There is no difficulty in recognizing the appropriate evidential interpretations 
of outcomes of the extreme cases in this class of experiments. The completely 
informative experiment (— ©, ©) gives outcomes each of which can naturally 
be called completely informative: the outcome r = © supports the certain 
inference that H, is false and H; is true. An alternative interpretation, which is 
equivalent for all purposes of application, is: the inference that He is true is 
practically certain, in the highest possible degree. Similarly, the outcome r = 
—« supports the certain inference that H, is true. The uninformative experi- 
ment (0, 0) gives outcomes each of which can naturally be called (completely ) 
uninformative: an outcome r = 0 has no relevance to the hypotheses, and there- 
fore gives no support in any degree to any inferences concerning the hypotheses. 

In any intermediate case (—r2, r2),0 < ™ < ©, it is natural and necessary 
to attribute to the positive outcome the qualitative evidential property of sup- 
porting Hz (as against H,), and to the negative outcome the property of sup- 
porting H,. In addition to intrinsic plausibility, these qualitative evidential 
properties attributed to the possible numerical values of r, rz or —r2 , have the 
objective interpretation and justification that, under each hypothesis, the prob- 
ability that such an interpretation will be qualitatively inappropriate (the 
probabilities of a ‘false positive’ (Type I error) and of a ‘‘false negative” (Type 
II error), in the obvious simplest testing or two-decision procedure) is equal to 


(10.1) a = alr.) = 1/(1 + e”) < 3. 


If 0 <nm< nm < ©, we interpret the positive outcome of the experiment 
(—re, 72) as supporting H, more strongly than the positive outcome of the experi- 
ment (—re, r2). This interpretation is supported by the considerations that out- 
comes statistically equivalent to those of the latter experiment can be generated 
by modifying outcomes from the former experiment by the ‘addition of pure 
noise” unrelated to the hypotheses, in the sense of Section 3 above; and that 
alr2] < are], since a[r.] decreases from } to 0 as rz increases from 0 to ~. 

In summary, over the class of symmetric simple binary experiments, the 
function r = log [f2(x)/fi(2)] has been given an unequivocal and consistent set 
of evidential interpretations: r = r(x) is an objective, internally-consistent and 
efficient indicator of evidence relevant to hypotheses in experimental outcomes. 


11. Symmetric binary experiments. A binary experiment FE, not necessarily 
simple, will be called symmetric if its canonical form v(u) is symmetric about the 
line u + v = 1; that is, if foreach u,0 S u S 1, we have v(1 — v(u)) = 1 — u. 
For any such experiment, the method of the proof of the decomposition theorem 
of Section 5 above gives a mixture experiment, equivalent to E, each of whose 
simple components has the symmetric form (—rz , r2) ; as in Example 2 of Section 
4 above, any such mixture can be represented by a (generalized) c.d.f. G(r2) on 
the range 0 S r2 S ~. For any given symmetric binary experiment E, let Eg 
denote this equivalent mixture experiment. 

Since Eg and EF are mathematically equivalent, in particular for purposes of 
informative inference, and related questions of evidential interpretations of out- 





FOUNDATIONS: BINARY EXPERIMENTS 431 


comes, we can consider any outcome r of £ as if it were a mathematically-corre- 
sponding outcome of KE, . Each outcome of this symmetric mixture experiment 
Eg has the form (—re, rz, r), where r = 7 or —?72. Since ro is the observed 
value of a random variable having under each hypothesis the same known dis- 
tribution G(r2), the observed value rz is irrelevant as evidence concerning the 
hypotheses. The observed value r, determines the symmetric simple binary 
experiment (—r2 , 72) which is performed; hence rz = |r| indicates, as in the pre- 
ceding Section, just the strength of the evidence which is provided by the out- 
come r of the experiment (—r2 , r2). It is possible and necessary to interpret the 
outcome r of the latter experiment in the way established in the preceding section 
for outcomes of symmetric simple binary experiments, for purposes of informa- 
tive inference, since the appropriate frame of reference for considering the evi- 
dential character of r is clearly the selected simple experiment, and the structure 
of Eg is otherwise clearly irrelevant to such interpretations. 

Because of the equivalence of EF and Eg , and the related equivalence between 
outcomes of the two respective experiments having numerically equal values r, 
we obtain from the preceding paragraph the following general conclusions: 
Given any symmetric binary experiment E, for purposes of informative inference, 
any outcome r of E must be interpreted evidentially in the same way as a numerically- 
equal outcome of a symmetric simple binary experiment. In particular, given r, the 
mathematical form of E is irrelevant for such purposes and interpretations. 

To illustrate this conclusion in concrete terms, a physical interpretation of 
Example 1 of Section 4 above may be useful. Suppose that four measurement 
instruments (or techniques of observation) are available in an investigation 
concerning two hypotheses, with each instrument giving dichotomous outcomes 
“positive” or “negative,’”’ and each instrument symmetric in the sense that it has 
equal probabilities a of false positives and of false negatives. Let the simple 
experiments Ey, H,, EH. defined in Example 1 represent respectively three of 
these instruments, when each is used without replication (to obtain a single 
observation). Let the fourth instrument have a = .2, and let EH denote the 
experiment consisting of four independent measurements by this instrument; 
then FE is the binomial experiment of Example 1. 

Let Eg denote an experimental procedure in which one of the first three 
instruments is selected at random, with the respective probabilities g; given in 
the Example, and in which the instrument selected is used to obtain a single 
measurement. With this procedure, if the worthless instrument 2» happens to 
be selected, one may fairly plead victimization by rather improbable bad luck, 
and indeed one had good reason to hope for and expect selection of a more 
informative instrument; however these considerations are irrelevant to the prob- 
lem of making informative inferences from a measurement provided by E, to the 
hypotheses; for this problem, the only relevant considerations are that the 
instrument and its measurements are strictly worthless, and that this outcome 


of the experiment EF, provides, recognizably, no contribution whatsoever to the 
inference problem. 





432 ALLAN BIRNBAUM 


In terms of the binomial experiment Z, the outcome xz = 2 corresponds (under 
the mathematical equivalence of EF with E,) to the selection of Ey (and occur- 
rence of either of its outcomes) in Eg. Hence there is no reason to give the 
binomial outcome, z = 2, interpretations differing in any respect from the inter- 
pretations just described for the outcome Ey of Eg. Nor is there any reason to 
consider any other aspect of the binomial model of the experiment EZ, for pur- 
poses of informative inference, given that r(z) = r(2) = 0, a recognizably 
(completely) uninformative outcome. 

Suppose, alternatively, that in the mixture experiment HE, the most informa- 
tive instrument, FE, , is by good fortune selected. Granting that the occurrence 
of such good luck is irrelevant as evidence regarding the hypotheses, it is most 
relevant to the quality or strength of inferences which may be made from a 
measurement supplied by £; . Evidently there is no reason to qualify or weaken 
the resulting inference statements on the ground that one was not sure before- 
hand that one would have the good luck to be able to use the best possible 
instrument. Suppose that use of the selected instrument E, gives a positive 
outcome, r = 256. Under the mathematical equivalence between E and Ez, 
this outcome corresponds to the outcome z = 4 of the binomial experiment EF 
(that is, to four positive outcomes in four independent measurements by the 
instrument having a = .2), for which we also (necessarily) have r(x) = 256. 
It follows that the outcome xz = 4 of the binomial experiment FE should be inter- 
preted in exactly the same way, as evidence relevant to the hypotheses, as if it 
were a positive outcome obtained in a single measurement by an instrument FE, 
having probability a = .0039 of false positives and of false negatives. The 
numerical value r = log (1 — a)/a = log 256 serves, by definition, as a compact 
abbreviation for such an evidential interpretation of the outcome z = 4. Analo- 
gous interpretations apply to the remaining possible outcomes z of E. 


12. Binary experiments in general. To extend the scope of the preceding 
evidential interpretations of the statistic r to binary experiments which are not 
necessarily symmetric, let H: v(u) be any binary experiment. Let E*: v*(u) be 
the “reflection” of v(u) in the line u + v = 1; that is, for each point (w’, v(u’) ) 
of the (continuous) graph of v(u), let the graph of v*(u) contain the point 
(u”, v*(u”)) = (1 — vo(u’), 1 — uw’). Let E™ = $F © 3E*; that is, E** is the 
mixture experiment having FE and E* as components with probabilities each 34. 
Then E£** is a symmetric binary experiment. If the experiment Z** were under 
consideration, and if its component EF were selected, then any outcome r of E 
must be interpreted evidentially in the way described in the preceding Section, 
since E** is symmetric; the selection of £ is irrelevant here, given the numerical 
value of r. 

Returning to consideration of the given experiment E, any outcome r of E is 
equivalent, for purposes of informative inference, to an outcome of the mixture 
experiment E** in which the component E is first selected, and then the outcome 
r is observed. It follows that the evidential interpretations of outcomes r of any 
binary experiment must be of the same kinds as those given in the cases dis- 
cussed in the preceding Section. 





FOUNDATIONS: BINARY EXPERIMENTS 433 


13. Inferences based on the likelihood function. The results of the preceding 
analysis may be summarized as follows: When any binary experiment EF is used 
for purposes of informative inference, and when any specified outcome r of £ is 
obtained, the mathematical structure of Z is then irrelevant to those purposes, 
and just the numerical value r is relevant. Any such observed numerical value r 
has an intrinsic objective probabilistic character as evidence relevant to H, or 
H, ; namely: (a) the qualitative property that the outcome favors H, if r is 
positive, favors H, if r is negative, and is irrelevant as evidence if r = 0; and 
(b) strength, as evidence, identical with that of a single outcome of the sym- 
metric simple binary experiment (—r2, 72), where rz = |r|. The latter simple 
experiment has probabilities of false positives and of false negatives each equal to 


(13.1) a = af|ri] = 1/(1 +e") s 3. 


We may say that such inferences are based just on the likelihood function 
[fi(x), fo(a)] on the observed outcome z, since r(x) is a compact representation 
of the likelihood function in the case of any binary experiment. 

If any evidential interpretations of observed values of r(x) are regarded within 
the frame of reference of the specific binary experiment EH from which z is ob- 
tained, then we have formally a particular case of the procedures discussed in 
Section 8 above. However, such evidential interpretations of outcomes r(z), 
despite their objective aspects, are in general deficient for purposes of informa- 
tive inference to the extent that they differ from the evidential interpretations 
of the likelihood function described above. 


14. Appraisal and design of experiments for informative inference. Granting 
that the structure of a binary experiment is irrelevant to the evidential! inter- 
pretation of an outcome z, apart from determination of r(z), there remain the 
important problems of appraising, comparing, and designing experiments for 
purposes of informative inference. Here the structure of an experiment is most 
relevant, and the partial ordering discussed above is basic: Error-probability 
curves 8(a) (and their analogues in more complicated experiments) which have 
been studied extensively in modern mathematical statistics, although usually 
given other interpretations, are of direct use for such purposes. No simple order- 
ing of experiments, nor numerical measure of information in outcomes or experi- 
ments, seems adequate for such purposes in general (although possibly useful in 
a large-sample approximate sense), since the evidential meanings and values 
of numerical values r(x) are primitive (although objective) and the distributions 
of r(X) can in principle be considered directly. 

As an example of experimental design problems for informative inference, 
suppose that for two simple hypotheses it is required to obtain as economically 
as possible statistical evidence with strength represented by |r(x)| 2 log 99. If 
repeated independent observations Y; are available, with densities g,(y), go(y) 
under the respective hypotheses, and if costs depend only upon the number of 
observations (increasing with the latter in any way), it follows immediately 
that the most economical experimental design is given by the sequential sampling 





434 ALLAN BIRNBAUM 


(t1,*** Yn), Satisfy |r(x)| 2 log 99. Such sampling rules are the same as Wald’s, 
given for the problem of sequential testing between two simple hypotheses. 
(The elementary determination of this rule as best for informative inference 
contrasts sharply with the difficult proof of its optimality for the testing prob- 
lem.) If indefinitely large sample sizes are not allowed, even with small prob- 
ability, the specification of the problem must be altered. 


rule which terminates when for the first time the observations taken, x = 


15. Relations between statistical evidence and significance tests. Let E be 
any informative binary experiment, v(u) #4 u, and for some a, 0 < a < a’, 
let d.(u) be the best test of level a as defined in Section 8 above. Then as above 
this test has 8 = 1 — v(1 — a), 0 < B < 1. If outcomes of E are reported only 
in the form, either d, : “reject H,’’; or d, : “do not reject H,” (or “accept H,’’), 
then this significance test procedure is equivalent to the simple binary experi- 
ment E’ in which the likelihood ratio statistic L has only the two possible values 
I, = B/(1 — a) <1 (ford,) or L,: = (1 — 8)/a > 1 (for d.). Hence the outcome 
“reject H,” has strength, as evidence, corresponding to the value L, of the 
likelihood ratio statistic, and is associated intrinsically in the sense of Section 
14 above with the error-probability a* = 1/(1 + Iz) = a/(1 — B+ a). 

If the ratio L, of the test’s power (1 — 8) to its level a is not far above unity, 
then a* is not far below .5, and the evidential strength of the outcome ‘“‘reject 
H,” is correspondingly slight; this can be the case within wide limits, for any 
value of a, including very small values. Thus in binary experiments a small value 
of a does not in general imply high evidential strength in the outcome “‘reject H,’’, 
and the determination of the evidential strength of such an outcome depends 
upon 6 as well as a, through the function L, = (1 — 8)/a. (Within a specified 
binary experiment, if a is decreased, then JL» is increased, at least if v(u) is 
strictly convex; however the upper limit approached by L, as a decreases may 
or may not be far above unity. ) 

On the other hand, if 8 is appreciably below .5, then small! values of a@ corre- 
spond to similarly small values of a*, the error-probability intrinsically asso- 
ciated with the outcome “reject H, .”’ For example, 8 < .25 implies a/(1 + a) < 
a* < (4/3)a; if @ is also small, such inequalities imply that a* = a. That is, if 
both a and 8 are small, then the error-probability a* corresponding to the intrinsic 
evidential strength of the outcome “reject H,’’ is approximately equal to a. 


Parallel remarks apply to evidential interpretations of the outcome “accept 
H, ” 


While the preceding considerations clarify, and in important cases support, 
certain qualitative and quantitative features of customary uses and interpreta- 
tions of significance tests as techniques for informative inference, they do not 
completely support the method of significance tests as such for purposes of 
informative inference. For such purposes, the methods based as described above 


directly on the likelihood function are preferable in principle, for the reasons 
given there. 





FOUNDATIONS: BINARY EXPERIMENTS 435 


16. Relations of statistical evidence to prior information and to conclusions. 
The preceding sections have dealt with a single aspect of situations of informa- 
tive inference: the nature and properties of experimental outcomes as evidence 
relevant to statistical hypotheses. If each statistical hypothesis represented in a 
binary experiment is regarded initially as possibly true, then in many situations 
evidence against one hypothesis, if sufficiently strong, would support a conclusion 
that that hypothesis is false. The general nature of conclusions in various con- 
texts of investigation, their uses, limitations, and possible ultimate reversibility, 
are familiar (cf., Tukey, [4]). These features of conclusions, and the strength of 
statistical evidence which would suffice in any given situation to support a con- 
clusion, are among the aspects of inference situations (like (b) and (c) of Section 
9 above) whose formal specification is problematical. But the process by which 
informal consideration of the various aspects of inference situations, including 
experimental outcomes, sometimes leads to conclusions, is familiar; and the 
formal and objective evidential properties of experimental outcomes, analyzed 
above, are conveniently assimilable in this process. 

One aspect of an inference situation whose formal specification is often proble- 
matical is that of prior opinions or information, including relevant previous 
experience, indirect evidence, and general theoretical considerations. Bayesian 
treatments of inference problems, in which such considerations are represented 
by prior probabilities (in some sense) of the statistical hypotheses considered, 
will not be discussed here, except to note that they coincide with the informal 
process referred to in the preceding paragraph in taking just the likelihood 
function as the appropriate indicator of evidence in outcomes relevant to the 
hypotheses, and that they differ only in their degree and mode of formalization 
of other aspects of an inference situation. 


17. Acknowledgments. An example given by Cox [5] illustrated the usefulness 
of mixtures of experiments for analysis of problems in the foundations of statis- 
tical inference. A special status and role of the likelihood function in informative 
inference was pointed out by Fisher and by Barnard [6]; however the above 
methods of analysis and interpretation are new. 


REFERENCES 

[1] BounensBuiust, H. F., SHaptey, L. 8S. anp SHerMan, S., ‘Reconnaissance in game 
theory,’’ Research Memorandum RM-208, The Rand Corporation, Santa Monica, 
August 12, 1949. 

[2] BLackwe.L, Davin, ‘“‘Comparison of experiments,’’ Proceedings of the Second Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, Berkeley, 1951, pp. 93-102. 

[3] SHaNNOoN, CLaupe E., ‘“‘A note on a partial ordering for communication channels,’’ 
Information and Control, Vol. 1 (1958), pp. 357-372. 

[4] Tuxey, Joun W., “Conclusions vs. decisions,’’ Technometrics, Vol. 2 (1960), pp. 423- 
433. 

[5] Cox, D. R., ‘Some problems connected with statistical inference,’’ Ann. Math. Stat., 
Vol. 29 (1958), pp. 357-372. 

[6] Barnarp, G. A., “Statistical inference,’’ J. Roy. Stat. Soc., Suppl., Vol. 11 (1949), pp. 
115-139. 





SOME EXTENSIONS OF THE IDEA OF BIAS 
By H. R. van peR Vaart! 
Leiden University 

1. Introduction and Summary. Laplace ({13], p. 44, lines 5 and 6), in his state- 
ment concerning the “milieu de probabilité”’, seems to have referred to a proba- 
bility distribution of the true value of a certain quantity (“le véritable instant du 
phénoméne”’), or, as we would say at present, to a probability distribution of a 
certain parameter. Thereby he differs from the attitude adopted in most of the 
work discussed in the present paper. Yet, one might hold that he possessed the 
idea of median-unbiased estimators. At any rate, when applying his notions to 
what Todhunter ({26] p. 469, art. 875) calls a case of no practical value, Laplace 
({13], p. 48, lines 11 and 12 from the bottom) virtually rejected the use of arith- 
metic means of observations. Judging from innumerable texts, one finds that 
after him emphasis has long been mainly on mean-unbiasedness (see, however, 
Pitman (]20], bottom of p. 215), who mentions the existence of bias in the sense 
that the probability that a certain mean-unbiased estimator is less than the 
parameter in question is >4). Yet it is hard to find the requirement of mean- 
unbiasedness justified in print (cf., Brown (([3], lines 6-8 of Section 3): the average 
of independent mean-unbiased estimates is consistent; Lehmann ((14], lines 4-10 
from bottom of p. 588): mean-unbiasedness flows from his general concept in the 
case of a quadratic loss function; Birnbaum ([2], p. 32): mean-unbiasedness is 
merely a technically useful property of the classical estimators in the linear 
estimation problem, which, at least in the case of normal errors, could equally 
well or preferably be justified on the basis of median-unbiasedness), much harder, 
in fact, than to find warnings against the hope that much is gained if an estimator 
be mean-unbiased (cf., Kendall ({12], Vol. 2, Section 17.9) ; the examples provided 
by Girshick, Mosteller and Savage ({9], middle of p. 20), Halmos ((10], the end 
of p. 43), Savage ((23], bottom of p. 244); lack of invariance under certain trans- 
formations being stressed by Halmos ((10], bottom of p. 42), Brown ((3], lines 
13-16 of Section 3), Fisher ({7], p. 143, line 13 from bottom)). All the same, much 
interesting work has been devoted to mean-unbiased estimators, some of it in- 
vestigating the conversion of biased estimators into unbiased ones (e.g., Que- 
nouille [21], Olkin and Pratt [17]), or deriving unbiased estimators ab initio (e.g. 
Tate [25]). It is not the purpose of this paper to provide a bibliography that is 
at all near completeness, but it is interesting that the last two references illus- 
trate a statement, made by Schmetterer ({24], middle of p. 215), to the effect 
that a close connection exists between integral equations and linear operators 
on the one hand, and the theory of mean-unbiased estimators on the other. This 

Received September 17, 1959; revised September 12, 1960. 

1 Research sponsored by the Office of Ordnance Research under Contract No. DA-36-034- 


ORD-1517 (RD) while the author held a visiting appointment at the Institute of Statistics, 
North Carolina State College, Raleigh, N. C. 


436 





EXTENDED IDEAS OF BIAS 437 


suggests that part of the motivation for the research in this field is of a mathe- 
matical, rather than a statistical nature. This view seems to be corroborated by 
Fraser’s statement ([8], lines 12-14 from bottom of p. 49) to the effect that median- 
unbiasedness does not seem to lend itself to the mathematical analysis needed to 
find minimum risk estimates, and hence has found little application. 

The present paper seeks to extend the notion of unbiasedness (and the notion 
of bias) in a direction different from Lehmann [14] (who gave a definition within 
the framework of general decision theory), and from Brown [3] (who was pri- 
marily concerned with types of unbiasedness, among them median-unbiasedness, 
that are invariant “under simultaneous one-to-one transformations of the 
parameter and (its) estimate’’, or rather under simultaneous strictly monotone 
transformations of the parameter and its estimate), and from Peterson’s [19] 
density-unbiasedness. It originated in work by the author [29], [30], on the esti- 
mation of the latent roots of certain matrices occurring in response surface 
theory. It had become clear that in this case it was of primary interest whether 
or not the frequency of obtaining too small (or too large, respectively) estimates 
would be unduly large. The present paper will make this notion more precise. 
Several types of bias (or of unbiasedness, respectively) will emerge, all of them 
clearly invariant in the sense of Brown. Median-unbiasedness will turn out to be 
a special case of this larger concept. Finally, certain seemingly unfamiliar proper- 
ties of the sample median, of the product-moment correlation coefficient, and of 
Olkin and Pratt’s function of the latter [17] will be proved and used to illustrate 
some of the concepts discussed. 


2. Some new bias concepts. Let ¢(P) be a real valued function (a “parameter’’) 
defined on a set ® of probability distributions P on a space X of points x. Let 
the real function f(X) of the random variable X represent an estimator of ¢(P). 
When would one call an estimate f(z) too small? A reasonable answer would be: 
if this estimate is smaller than a certain value (possibly depending on P) which 
is to be called the comparing value, and to be denoted by »(P); the selection of 
useful comparing values will be discusse.! after Definition 1. When would one 
call the frequency of obtaining too small estimates unduly large? A reasonable 
answer to ‘his second question would be: if it is larger than it would have been 
if a different (‘“‘better’’) method of estimation would have been used, that is, 
if it is larger than it would have been with a different estimator, which is to 
called the comparing estimator, and to be denoted by c(X). This tentative argu- 
ment naturally leads to the concept described in 

DEFINITION 1. The estimator f(X) of ¢(P) will be called negatively y(P)-biased 
relative to the estimator c(X ) if 


(2.1) PUf(X) S v(P)] > Ple(X) S v(P)] foreach Peo. 


By replacing the two S-signs in (2.1) by 2-signs one obtains the definition 
of positive y(P)-bias. Definition 1 has left the function y(P) unspecified: so 
this type of bias comprises as many varieties as there are choices of comparing 





438 H. R. VAN DER VAART 


values y(P). Whether, in a given estimation problem, a comparing value y(P) 
and a comparing estimator c(X) can be chosen so as to provide a useful variety 
of relative y(P)-bias, will depend on the nature of the problem. We shall indicate 
two examples in the next two paragraphs and refer the reader to point e and to 
the last sentence of point k in Section 5 for an additional one. 

As a first example, consider the problem of estimating the coefficients of the 
canonical form of the second degree part of the equation for a quadratic response 
surface. It is well known that the signs of these coefficients are important since 
they determine the type (hyperbolic, ellipsoidal, etc.) of the surface. So here 
y(P) = 0 suggests itself as a comparing value, and if for all possible quadratic 
response surfaces with positive true values of the canonical coefficients a certain 
estimator f(X) of the smallest coefficient is negatively zero-biased relative to a 
comparing estimator c(X), it is quite clear that in this respect the comparing 
estimator is better than f(X) (although it may be worse in some other respect; 
cf., for instance point d of Section 5). 

The second example is connected with the concept of median-bias. Suppose 
that an estimator g(X) of the “parameter” ¢(P) exists which satisfies the 
condition 


(2.2) Medp g(X) = ¢(P) foreach Pe@; 


if more than one function g(X) satisfies (2.2), just choose one of them; 
Med, g(X) denotes (one of) the median(s) of g(X) under the probability dis- 
tribution P on the space X; Lehmann [15, p. 80-83], pointing out a simple con- 
nexion between median-unbiased estimators and confidence intervals, gives a 
condition on ®, which guarantees that one and only one estimator g(X) will 
satisfy (2.2). Now in (2.1) choose y(P) = ¢(P), c(X) = g(X), then because of 
(2.2) condition (2.1) becomes 

(2.3) P(f(X) Ss ¢o(P)] > Plo(X) S ¢(P)) = Plg(X) S Medpg(X)] 2 3. 


Now, on one hand (2.3) means that, as an estimator of ¢(P), f(X) is negatively 
¢(P)-biased relative to the estimator g(X), on the other hand, under a certain 
condition, (2.3) entails the inequality Medp f(x) < ¢(P), which means that, 
as an estimator of o(P), f(X) is negatively median-biased (the above-mentioned 
condition being that not only P[f(X) = ¢(P)] — 4 > 0, which follows from 
(2.3), but also P[f(X) S ¢(P)] — 4 > Plf(X) = ¢(P)], which is certainly true 
if for each P ¢ & the distribution of f(X) is continuous). Thus the concept of 
relative y(P)-bias described in our Definition 1 is seen to generalize the concept 
of median-bias. 

In certain contexts it is useful to admit as comparing values all values 
¢(Q) ¢¢(@), i.e., all possible values of the parameter. This leads to the concept 
described in 

Derinition 2. The estimator f(X) of ¢(P) will be called negatively distribution. 





EXTENDED IDEAS OF BIAS 


biased relative to the estimator c(X ) if 


(24) PIf(X) & o(Q)] = Ple(X) & o(Q)] for each Peo, QeE®, 


and the inequality is strict for ai least one pair (P,Q). 

By replacing the two S-signs in (2.4) by 2-signs one obtains the definition of 
positive distribution-bias. A close connexion clearly exists between the condition 
for the estimator f(X) being negatively distribution-biased with respect to the 
estimator c(X), and the condition for the random variable f(X) being sto- 
chastically smaller than the random variable c(X); as to the latter concept see 
Mann and Whitney ([16], line 3 of Section 2). 

Note that, whereas the definitions of y(P)-unbiasedness and median-unbiased- 
ness are self-evident (in (2.1) and (2.3) replace > by =), the definition of dis- 
tribution-unbiasedness presents difficulties. On one hand, it seems impracticable 
to define distribution-unbiasedness of f(X) relative to c(X) otherwise than as 
f(X) and c(X) having the same distribution function. On the other hand, to 
call f(X) distribution-unbiased relative to c(X) only if f(X) and c(X) have the 
same distribution function for each P ¢ © (a condition obtained if in (2.4) the 
2-sign is replaced by =) is unsatisfactory, because estimators will exist which 
are neither biased nor unbiased in this sense. Hence we will not attempt a defi- 
nition of distribution-unbiasedness. 

One more point has to be mentioned. One might think that it should be pos- 
sible to make the rather vague notion of negative bias as an unduly large fre- 
quency of obtaining too small estimates more precise without introducing the 
concept of comparing estimators: one might endeavour to define the frequency of 
obtaining too small estimates (i.e., estimates smaller than the comparing value 
y(P)) as being unduly large if the probability of obtaining estimates S7(P) 
would be large as compared with the probability of obtaining estimates 2y(P); 
that is to say, if the ratio P[f(X) S y(P)]/PIif(X) = y(P)] would be large, >k, 
say. Thus, in order to make this approach work, we should have to decide upon 
the value of k. Decisions of this kind would be to a large extent arbitrary. Upon 
a moment’s reflection it turns out that about the only natural way to find 
“plausible” values of k consists in considering the value of the above-mentioned 
ratio of probabilities when another estimator, c(X) say, is substituted for f(X). 
Therewith our comparing estimator has proved indispensable. 


3. A remark on terminology. To avoid confusion we note that for median-(un)- 
bias(edness) and mean-(un)bias(edness) (cf., Brown [3], p. 583) other terms may 
be substituted. For example, Eisenhart and Martin [6] use the term ‘downward 
bias in the probability sense” instead of ‘negative median-bias”’, and in a personal 
communication to the author (June, 1958) Eisenhart uses ‘“‘probability-wise un- 
biasedness”’ instead of ‘“‘medien-unbiasedness’’. 

As is well known, f(X) is a mean-unbiased estimator of ¢(P) if 


(3.1) &ef(X) = ¢(P) foreach Pe@. 


Instead of ‘‘mean-bias’”’ Eisenhart and Martin [6] use “bias in the mean-value 





440 H. R. VAN DER VAART 


sense’, in the above-mentioned communication to the author Eisenhart uses 
‘‘on-the-average bias”, and the present author personally prefers expectation-bias 
(similarly expectation-unbiasedness), since bias and unbiasedness of an estimator 
are properties of its theoretical distribution, and statistical usage tends to substi- 
tute “expectation” for “mean” in connexion with theoretical distributions. 


4. A lemma. A very simple lemma, which nevertheless is a useful tool in prov- 
ing that certain estimators are biased in the sense discussed in Section 2, is 

Lemma 1. Whether the random variables T = t(X) and U = u(X) are inde- 
pendent or dependent, if 


(4.1) P[U 2 vj = 1, 

then 

(4.2) PIT Ss ry} 2 P(T+ U8 7+ yh. 

A necessary and sufficient condition for the equality sign to hold in (4.2) is 
(4.3) P[T+U>r+v)N(Ts7)] =0. 


Proor. Let U* = U — v and T* = T — 1; the proof is then immediate from 
a sketch in the (7*, U*)-plane. 

Although joint distributions of T and U satisfying (4.3) may be rather un- 
common, it is evident that (4.3) cannot be proved or disproved from the con- 
ditions of the lemma alone. Two extra conditions, each of them sufficient for 
(4.3) not to hold, are 

(a) each (measurable) set in the half plane U > vin (T, U)-space has positive 
probability, 

(8) (implied by a:) Some set [((U > vo) N(r = T > 70)] with ro + vo > +r + v, 
to < 1, has positive probability; this condition is satisfied for instance if U and 
T are independent and P[r = T > 7] > 0, P[U > wo] > 0. 

While Lemma 1 may serve to prove negative bias of the types discussed in 
Section 2, positive bias may be derived from Lemma 1’, which is obtained from 
Lemma | by reversing all six inequality-signs in Lemma 1, except the second in- 
equality-sign in (4.2). 


5. Supplementary remarks and examples. 

(a). From Definition 1 it follows easily that relative y(P)-bias is transitive: 
if gi(X), go(X), and g;(X) are estimators of 9(P), and if g,(X) is negatively 
(P)-biased relative to g.(X), and g2(X) is negatively y(P)-biased relative to 
gs(X), then gi(X) is negatively y(P)-biased relative to g;(X). Hence it is pos- 
sible to arrange any number of estimators according to degree of negative y(P)- 
bias; in the above case we would have: g;(X) > g2(X) > g3(X) (where > would 
mean: “has more negative y(P)-bias than”; the value of the difference 
P(gi(X) S y(P)| — Plge(X) Ss y(P)] would be a useful measure of how much 
more y(P)-biased g,(X) is than g.(X)). Although this arrangement would not 





EXTENDED IDEAS OF BIAS 441 


without further consideration permit the conclusion that g;(X) is a better 
estimator than g.(X), and g2(X) a better estimator than g,(X), it is clear that 
in general, if y(P) < ¢(P) for each P ¢ @, one would tend to consider estimators 
to be worse as they are more biased in the sense of negative 7(P)-bias. In the same 
vein, if y(P) > ¢(P) for each P ¢ &, one would tend to consider estimators to be 
worse as they are more biased in the sense of positive y(P)-bias. 

(b). Through the concept of y(P)-bias the notions of bias and of inefficiency 
merge into each other: if the estimator f,(X) of g(P) is distributed N(¢(P), 20°) 
and the estimator f2(X) is distributed N(¢(P), 0”), then f,;(X) is both negatively 
(y(P) — ke)-biased (k > 0) relative to fe(X) and less efficient than f2(X) 
(Note that in the examples b, c, d the parameter o is assumed to be known). 

(c). It should not be surprising (though it is worth while noting) that differ- 
ent criteria of (un)bias(edness) may be incompatible (see also point f below). 
Thus, even an expectation-biased and median-biased estimator like f;(X), dis- 
tributed N(g(P) — 40, o°), would from the point of view of negative 
(¢(P) — 2c)-bias, say, be better (i.e., less biased) than the expectation-unbiased 
and median-unbiased estimator f;(X), distributed N(¢(P), 20’). This pair of 
estimators is interesting for yet another reason: f;(X) has also a smaller mean 
square error than has f;(X): &[fs(X) — o(P)? < &fi(X) — ¢(P)). Thus the 
idea of y(P)-bias, in cases where a comparing value y(P) naturally suggests 
itself, may help to bridge the gap which often exists between the requirements of 
least bias and least mean square error. 

(d). However, the requirement of least y(P)-bias will not always agree so 
well with other criteria for ‘good’ estimators. For instance, the above-mentioned 
estimator f2(X) of ¢(P), distributed N(¢(P), 0’), is negatively (y(P) — ke)- 
biased (k > 0) relative to the estimator f,(X), distributed N(¢(P) + o, 0’). 
So, according to point a above, one would tend to consider f,(X) a better esti- 
mator. Yet, f4(X) has a number of undesirable features: for instance, it is not 
only positively expectation-biased and median-biased, it has also greater mean 
square error than f2(X). On closer inspection another thing turns out to be 
wrong with f,(X) as an estimator of ¢(P): for any k > 0 it is posi- 
tively (y(P) + ko)-biased relative to fo(X); so, according to the last sentence 
of point a above, f,(X) is worse than f2(X) in this respect. Thus, here is a simple 
example where the probability of obtaining too small estimates has been cor- 
rected at the expense of enlarging the probability of obtaining too large estimates. 
In certain contexts this may be all right, in other contexts it may be undesirable: 
if negative y'(P)-bias (y'(P) < ¢(P)) and positive y’(P)-bias (y"(P) > ¢(P)) 
are about equally undesirable features, both have to be kept as small as possible 
in some sense. This remark points out the relationship between our concept of 
(relative) y(P)-bias and two other criteria for “good’”’ estimators: 

(1) criterion 3 of L. J. Savage [23, p. 224], according to which an estimator 
gi(X) is called better than an estimator g.(X) if P[gi(X) < 1] + Plgi(X) > 
v2] S Plgo(X) < 1) + Plge(X) > v2] for every 1 S o(P), v2 2 o(P), Peo 


(with strict inequality for some 7; , yz, and some P); 





442 H. R. VAN DER VAART 


(2) the approach of A. Birnbaum’ ([2], pp. 113 seq.), who uses as a criterion the 
behavior of the function a(y, P;g) for P ¢ ®, y e¢(@): a(y,P;g) = Plg(X) S 7] 
ify < 9(P) and a(y, P;g) = Plg(X) 2 yl ify > ¢(P). 

(e). Summarizing, it seems fitting to note that, if it is of primary importance 
to avoid an unduly large frequency of obtaining too small estimates, then the 
concept of (relative) negative y(P)-bias, y(P) < ¢(P), leads to a useful criterion 
for good estimators. Similarly, positive ~(P)-bias is a useful concept if it is im- 
portant to avoid too large estimates. Examples of this situation are the estimation 
of latent roots (discussed at the end of Section 1, and in the first example after 
Definition 1 in Section 2), and the estimation of the correlation coefficient (to 
be discussed in points i and k below): in both these cases the comparing value of 
interest is zero. However, if it is important to avoid errors of under-estimation 
and over-estimation at the same time, then criterion 3 of Savage [23, p. 224] and 
the approach by Birnbaum, although the latter leads only to a partial ordering 
of estimators, provide more natural criteria than does the concept of distribution- 
bias (Definition 2 in Section 2)-though this term is useful in that it permits 
a succinct statement of certain results. 

(f). Next, we will give examples of an expectation-unbiased estimator which 
is median-biased (variance), of a median-unbiased estimator which is 
expectation-biased (median), of a negatively expectation-biased estimator 
which is positively median-biased (correlation-coefficient), of median-bias 
becoming less when an estimator is corrected for expectation-bias (variance), and 
of median-bias becoming worse when an estimator is corrected for expectation- 
bias (correlation-coefficient ) . 


(g). Let X,, X2,-+--, X, be an n-fold sample from a normal distribution 
N(u, o’). Then 


S = (n — ND (x — X)’ 
is an expectation-unbiased estimator of the variance o”. However, S’ is negatively 
median-biased: we will show that 
(5.1) P[S’ Ss o’] > }. 
Note that 
(5.2) PIS s o}] = 1 — Q(n — 1|n —1) 


= y{3(n — 1), 3(m — 1)} / T{3(n — 1)}, 
where the function Q(x’ | v) is defined by Pearson and Hartley [18, p. 122] and 


y(a,x) = fee ‘t*” dt, (cf., [1], Vol. 2, p. 133). Now, as 


a+l1 
e “a* > / e ‘t* dt, 


2 I want to thank the referee for drawing my attention to Mr. Birnbaum’s paper and to 
the connection between his approach and mine. 





EXTENDED IDEAS OF BIAS 


we have 


0 


a a at+l 

a [ et dt = &%a" + [ et dt > [ e ‘t* dt, 
0 0 

whence 

ay(a,a) > y(a+1,a+1), 


y(a,a)/T(a) > y(a+1,@+1)/"(@+1) forany a>0O. 


Hence, for any 8, 0 < 8 S 1, lim,..7(8 + n,8 + n)/T(B +n) = L(B) 20 
exists and 


(5.3) y(B +n, 8 + n)/T(B +n) > L(B) 


for any integer n > 0. From the asymptotic expression for y(a + 1, a + (2a)*y), 
given by Tricomi (27, p. 144, eq. (27)], one can derive that 


(5.4) y(a, «)/T(a) = § + [8(2xa)'T* + O(a"), 
which shows that in (5.3) L(8) = 4, independent of 8. Therefore, 
(5.5) y(a,a)/T(a) >} forany a>QO, 


which, together with (5.2), proves (5.1). 

Equation (5.2) permits the calculation of P[S*’ S o’] from the table of x’ 
by Pearson and Hartley [18, p. 122]. For n —1 = 1, 2,3 one finds P[S* S o*] = 
0.683, 0.632, 0.608. Forn — 1 = 4 the asymptotic expression (5.4) turns out 
to yield results which are accurate to 3 significant decimal places! 

The median-bias of S’, hence of S, is of interest in quality control, cf., Eisenhart 
[5]. The present author wants to thank Mr. Eisenhart for his kind letter (of 
June, 1958), in which he mentioned this interesting article as well as the ab- 
stract [6], where six different estimators of o are investigated as to their median- 
bias, and the report [4], where among other things a table for P[s S co] is given. 

(h). Let X,, X2, +--+ , X2m41 be an odd-sized sample from a univariate dis- 
tribution with continuous distribution function F(z), for which dF(x)/dz is 
positive in one (finite or infinite) z-interval. Rearranging the 2m + 1 values 
in the sample, use the notation X° < X® s --- s X°"*”. Under the condi- 
tions stated the occurrence of equality-signs has probability zero; X°"*” is the 
sample median. Let G denote the inverse (defined for 0 < F < 1) of the function 
F, so that G(4) is the median of the distribution considered. 

From the well-known formula 


PIX"* < y] = [Bim + 1, m+ Ff (ey) — FC)" aPC) 


it follows immediately that 


4 
(5.6) PIX"? < G(4)] = [B(m +1, m+ orf F"(1 — F)" dF = }. 





444 H. R. VAN DER VAART 


Hence the sample median X‘"*” is a median-unbiased estimator of the median 
G(4), whether F(x) represents a skew distribution or a symmetric one. 

On the other hand, it is very simple to define classes of continuous distribution 
functions such that the sample median X“"*” is an expectation-biased estimator 
of the median G(4). For we have 


+00 
B(m + 1,m + 1)-[ex"” - GQ) = [fy - 60) 


[F(y)I". — F(y)I" dF(y) = [ (@(F) - GG@))-F"G — FY" aF 
(5.7) : 


+4 
= [a+ - 6@)-@ - era 


; 
= [ (G44 + h) — G()] — [@@) — GG — W))}-G — "ah. 


Hence a sufficient condition for the sample median being a positively expectation- 
biased estimator of the median is that G(} + h) — G(4) > G(4) — G4 — hi), 
0 < h < 4, which describes a certain type of skewness of the distribution func- 
tion F(z). 

(i). Let (Xi, Y1), (X2, Yo), --+ , (Xn, Yn) be an n-fold sample from a bivari- 
ate normal distribution with correlation coefficient p. Define the sample correla- 
tion coefficient R* in the usual way by 


R= (D(X — X)(¥% — PMD (Xe — HUY; — PY). 
It is well known (cf., Kendall ([12], Vol. 1, p. 344, eq. (14.55)), Romanovsky 


([22], p. 42, eq. (128)), and reference ({1], Vol. 1, p. 59, eq. (10), and p. 114, 
eq. (1))) that 


&R = p-g(n, o°) = p-{T(4n)}?- {T[3(m — 1)]-THR(n + 1)]}7 
(5.8) -F(4, 4; ¥(n + 1); p”) 


= p-T(4n)- {T(4) -Th(n — oy f f(a — #)* "1 — tp") dt, 


where Euler’s integral representation for the hypergeometric function has been 
applied. From (1 — tp’)? < (1 — t)7* for any p° ¥ 1, t ¥ 0, it follows that 
for any p ~ 1 the integral in the last member of (5.8) is less than 


1 
I fo (i- ——— dt = B(3, 3(n — 1)), 


whence in (5.8) g(n, p') < 1 for any p’ ¥ 1, so that |§R| < |p|: R is a negatively 
expectation-biased estimator of p if p > 0, a positively expectation-biased estimator 
if p < 0. 


* Note that here F is not the multiple correlation coefficient: capitals are used through- 
out the paper in order to denote random variables. 





EXTENDED IDEAS OF BIAS 445 


In order to investigate possible median-bias of R as an estimator of p, use 
formula (25) of Hotelling [11, p. 200] (note that Hotelling’s n stands for our 
n — 1), by which 

P[R = pl = (n — 2)-T(n — 1)-(2e)*- {P(n — 4)}7-(1 — 
(5.9) F 2\4(n—4) 4(2n—3) 
[a = PY = pry, 35 2 — 45 HCL + pr) ar 

p 


By means of the substitution (r — p)/(1 — pr) = y the second member of (5.9) 
after some patient algebra reduces to 
(n — 2)I(n — 1) [ 2\3(n—4) 3 
See f(t 1+ 
FT} & ( y) ( py) 


(5.10) : : 
pee a 
F (3, 3;n — 351 Bre) dy. 


As p increases from 0 to 1, the integrand of (5.10) increases with p for any y > 0. 
Since P[R = p] = 3 if p = 0, this means that P[R = p] > 3 ifp > 0: Risa 
positively median-biased estimator of p if p > 0. The same argument yields the 
result that R is a negatively median-biased estimator of pif p < 0. In fact, P[R S p} 
equals the second member of (5.9) after integration from ¢ to 1 has been replaced 
by integration from —1 to p; in the expression thus obtained one may replace 
p by —|p| if p < 0; from the elementary substitution r = —z in the resulting 
integral it then follows immediately that if p < 0, P[R S p] exactly equals the 
second member of (5.9), hence equals (5.10), if only |p| is substituted for p; 
hence P[R S p] > 3 if p < 0. 

This example is interesting since it shows that the contention made by 
Tschuprow [28, p. 116], to the effect that the estimator R systematically underrates 
p, is dubious in that it may be taken to mean that R more frequently than not 
underrates p—which is not true as R more frequently than not overrates p(p > 0). 

(j). With S*, see under point g, compare *S’ as an estimator of o°: *S’ = 
n'- >> (X; — X)*. Evidently *S* = S’ (1 — n™’), hence o” > Med S* > Med *8”. 
At the same time o” = &S* > &*S*. So when the estimator *S’ is replaced 
by S’, its expectation-bias is corrected, and its median-bias becomes less. Un- 
fortunately, such a state of affairs is not universal as is shown by the 
next example. 


(k). With R, see under point i, compare *R as an estimator of p: 
(5.11) *R = R-F(4, 3; 3(n — 1);1 — FR’), 


ef., Olkin and Pratt ([{17], p. 202, eq. (2.3)). The second member of (5.11) is 
a strictly increasing function of R, ef., [17, Section 2.2]. Hence 


(5.12) Med, *R = (Med,R)-F{4, 3; 3(n — 1); 1 —(Med, R)’}. 


As |Med, R| < 1 if |p| < 1, the hypergeometric series in (5.12) is easily seen 
to be strictly larger than 1 if |p| < 1. Therefore, if0 < p< 1, Med, R >p>O 





446 H. R. VAN DER VAART 


(see under point i) and Med, *R > Med, R;if0 > p > —1, Med, R <p < 0 

(see under point i) and Med, *R < Med, R. 

So substituting *R for R as an estimator of p corrects expectation-bias, but 
makes median-bias worse. Finally, it is evident from (5.11) that *R and R are 
zero-unbiased with respect to each other. 

REFERENCES 

[1] BaremMan Manuscript Prosect, Higher Transcendental Functions, Vol. 1; Vol. 2; 
McGraw-Hill, New York, 1953. 

[2] ALLAN BrrnsBaum, “A unified theory of estimation I,’’ Ann. Math. Stat., Vol. 32, 
(1961), pp. 112-135. 

[3] GeorcE W. Brown, “On small-sample estimation,’’ Ann. Math. Stat., Vol. 18 (1947), 
pp. 582-585. 

[4] Joseru M. Cameron, “‘Tables for computing confidence limits for o,’’ SEL Note 54-11, 
Statistical Engineering Laboratory, National Bureau of Standards, 1954. 

[5] CaurcuiLy Ersennart, ‘Probability center lines for standard deviation and range 
charts,”’ Industrial Quality Control, Vol. 6 (1949), pp. 24-26. 

[6] CaurcHiLL E1sENHART AND Cetia S. Martin, “The relative frequencies with which 
certain estimators of the standard deviation of a normal population tend to 
underestimate its value,’’ Ann. Math. Stat., Vol. 19 (1948), p. 600, Abstract. 

[7] Str Ronaup A. FisHer, Statistical Methods and Scientific Inference, Hafner Pub., 
New York, 1956. 

[8] D. A. S. Fraser, Nonparametric Methods in Statistics, John Wiley and Sons, New 
York, 1957. 

{9] M. A. Grrsuick, FrepericK Moste.uer, AnD L. J. Savaae, ‘‘Unbiased estimates for 
certain binomial sampling problems with applications,’’ Ann. Math. Stat., Vol. 17 
(1946), pp. 13-23. 

{10} Paut R. Hatmos, ‘‘The theory of unbiased estimation,’’ Ann. Math. Stat., Vol. 17 
(1946), pp. 34-43. 

[11] Harotp Hore.uine, ‘“‘New light on the correlation coefficient and its transforms,’’ 
J. Roy. Stat. Soc., Ser. B, Vol. 15 (1953), pp. 193-232. 

[12] Maurice G. Kenpauu, The Advanced Theory of Statistics, Vol. 1, 3rd ed., 1947; Vol. 2, 
2nd ed., 1948, Griffin, London. 

[13] P. S. pp Lapxace, ‘‘Mémoire sur la probabilité des causes par les événements,’’ Oeuvres 
completes de Laplace, Vol. 8, pp. 27-65, Gauthier-Villars, Paris, 1891; (originally 
published in: Mémoires de l’Académie royale des Sciences de Paris (Savants 
étrangers), Vol. 6 (1774), pp. 621 seq., par M. de la Place). 

[14] E. L. Leumann, “‘A general concept of unbiasedness,’’ Ann. Math. Stat., Vol. 22 (1951), 
pp. 587-592. 

[15] E. L. Lenmann, Testing Statistical Hypotheses, John Wiley and Sons, New York and 
Chapman & Hall, London, 1959. 

[16] H. B. Mann anp D. R. Wuttney, ‘On a test of whether one of two random variables 
is stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 
50-60. 

[17] INGRAM OLKIN AND JoHN W. Pratt, ‘‘Unbiased estimation of certain correlation co- 
efficients,’’ Ann. Math. Stat., Vol. 29 (1958), pp. 201-211. 

[18] E. S. Pearson anp H. O. Hartiey, Biometrika Tables for Statisticians, Vol. 1, Cam- 
bridge University Press, Cambridge, 1954. 

[19] Raymonp P. Peterson, ‘‘Density unbiased point estimates,’’ Ann. Math. Stat., Vol. 
25 (1954), p. 398-401. 

[20] E. J. G. Prrman, “The ‘closest’ estimates of statistical parameters,’’ Proc. Camb. 
Philos. Soc., Vol. 33, (1937), pp. 212-222. 





EXTENDED IDEAS OF BIAS 447 


[21] M. H. QuENovILLE, ‘‘Notes on bias in estimation,’’ Biometrika, Vol. 43 (1956), pp. 


353-360. 


[22] V. RomaNnowsky, ‘‘On the moments of standard deviations and of correlation coefficient 


in samples from normal population,’’ Metron, Vol. 5, no. 4 (December 31, 1925), 
pp. 31-46. 


[23] Leonarp J. Savacg, The Foundations of Statistics, John Wiley and Sons, New York, 


{9a} 
i<s; 


1954. 


LEOPOLL SCHMETTERER, Einfiihrung in die Mathematische Statistik, Springer, Wien, 
1956. 


(25) R. F. Tats, “Unbiased estimation: functions of location and scale parameters,’ 


Ann. Math. Stat., Vol. 30 (1959), pp. 341-366. 


[26] I. TopHunteR, A History of the Mathematical Theory of Probability from the Time of 


Pascal to that of Laplace, Chelsea, New York, 1949. 


[27] F. G. Tricor, ‘‘Asymptotische Eigenschaften der unvollstindigen Gammafunktion,”’ 


Mathematische Zeitschrift, Vol. 53 (1950), pp. 136-148; ef., Math. Rev., Vol. 13 
(1952), p. 553. 


[28] A. A. Tscuuprow, Principles of the Mathematical Theory of Correlation (transl. by M. 


Kantorowitsch.), Hodge, London, 1939. 


[29] H. R. van DER Vaart, “Some results on the probability distribution of the latent 


roots of a symmetric matrix of continuously distributed elements, and some appli- 
cations to the theory of response surface estimation,” Institute of Statistics, Uni- 
versity of North Carolina Mimeo Series, No. 189 (1958), 1 + 40 p. 


[30] H. R. vAN peR Vaart, “On certain types of bias in current methods of response sur- 


face estimation,” Bull. Inst. Internat. Stat. Vol. 37 (3) (1960), pp. 191-203. 





MULTIVARIATE CORRELATION MODELS WITH MIXED DISCRETE 
AND CONTINUOUS VARIABLES 


By I. Orkin! anv R. F. Tare? 
Stanford University and Michigan State University; University of Washington 


1. Introduction and summary. A model which frequently arises from experi- 
mentation in psychology is one which contains both discrete and continuous 
variables. The concern in such a model may be with finding measures of associa- 
tion or with problems of inference on some of the parameters. 

In the simplest such model there is a discrete variable x which takes the values 
0 or 1, and a continuous variable y. Such a random variable z is often used in 
psychology to denote the presence or absence of an attribute. Point-biserial 
correlation, which is the ordinary product-moment correlation between z and y, 
has been used as a measure of association. This model, when zx has a binomial 
' distribution, and the conditional distribution of y for fixed z is normal, was 
studied in some detail by Tate [13]. 

In the present paper, we consider a multivariate extension, in which x = 
(to, %, °** , %) has a multinomial distribution, and the conditional distribu- 
tion of y = (y1, °°: , Yp) for fixed z is multivariate normal. 


2. Outline. Consider a random sample of n independent vectors (r., Ya), 
a = 1,---,m, where z has a binomial distribution, b(1, p). The conditional 
distributions of (y|z = 1) and (y|z = 0) are assumed to be (yu, o°) and 


N(po, o), respectively. If we define A = (wu, — wo)/o, then 


Pey = Alpg/(1 + pgd’)]’. 


Thus, studying p involves studying induced relations between means; for ex- 
ample, u: = uo if and only if p,,, = 0. The exact and asymptotic distributions of 
rz, were obtained by Tate [13]. 

We are now concerned with a multivariate analog of this model. Let 
(the, *** »Ypa,Xoa,***, Lea), @ = 1,-+*+,n, be a sequence of independent ran- 
dom vectors, where (xo, --- , %) has the multinomial distribution, 


H(to, +++ te) = porpr' -++ pe; Im = 0,1; 
k 


> tm = 1, 0 < pn <1, > pa = 1. 


0 


The conditional distribution of y = (yj, --- , ¥p) given tm = 1 is assumed to 
be (u, >), that is, p-variate normal with mean vector pu” = (um, °** 5pm), 
m = 0, I, --- , k, and positive definite covariance matrix 2. 

Received March 15, 1960; revised November 17, 1960. 

1 Research sponsored in part by the Office of Naval Research at Stanford University, 
and in part by the Office of Ordnance Research at Michigan State University. 


? Research sponsored in part by the Office of Naval Research and in part by the National 
Science Foundation, Grant 14284, at the University of Washington. 


448 





DISCRETE-CONTINUOUS CORRELATION 449 


As in the univariate case, the vanishing of various correlations, for example 
multiple correlation coefficients, induces certain constraints on the means. In 
Section 3, we give a number of relations between correlation coefficients and 
means. It will appear that the square of a correlation coefficient may act as a 
measure of dispersion among the possible multivariate normal conditional dis- 
tributions. 

For convenience of development as well as clarity, we consider separately the 
cases (i) k = 1, p > 1, (ii) k > 1, p = 1, (iii) Kk > 1, p > 1. In connection 
with Case (i), it will be shown that p2,,....,y,) is closely related to the distance 
function of Mahalanobis [7]. Section 4, dealing with the relevant distribution 
theory for Case (i), will exhibit the relationship between r2,i,,...,y,) and the 7° 
statistic of Hotelling [5], and will contain the exact and asymptotic distributions 
for Tz, (;,---.vp) - Lhe method of derivation for the asymptotic distribution con- 
stitutes something of a departure from the usual approach, since the statistic 
involved is a function of sample means, but the classical method of Cramér 
((2], Section 27.7) is not used, because it would involve too much calculation. 
The resulting distribution is formally identical to that obtained by Tate [13] for 
the ordinary correlation coefficient, which is an altogether surprising result. 

Section 5 presents the distribution theory related to Case (ii), including deri- 
vations for the exact and asymptotic distributions of partial correlation coeffi- 
cients, in addition to the main discussion of Ty,;2,,...,2,)- Unfortunately, the 
multiple correlation coefficient has a distribution which contains nuisance param- 
eters, that is, parameters other than po, 71, °-- , p, Of the z distribution, and 
the population multiple correlation coefficient. Moreover, this difficulty does 
not disappear in the limit. Cramér’s method, referred to in the last paragraph, 
is used to advantage here. 

Canonical correlations are introduced in Section 3, and serve to give a unified 
approach for our three cases. In the general case k > 1, p > 1, however, it is 
difficult to obtain results. An effort is made in Section 6 to indicate the problems 
involved. The vector correlation p, between the vectors x and y, which is also 
introduced in Section 3, although theoretically inferior to canonical correlations 
has the property that more can be accomplished with the sampling theory for 
its estimate r,. In Section 6 it is shown that r, is essentially distributed as a 
U-statistic of Wilks [15]. A distribution of Rao [10] is important in this con- 
nection. 

Throughout the paper estimates will be the natural sample counterparts of the 
parameters which they estimate. They can all be obtained by the method of 
maximum likelihood, and this is shown in Section 7. 

Section 8 contains a summary of procedures developed throughout the paper, 
together with examples of situations in which they would be appropriate. 

Moustafa [8] has made a detailed study of models employing a multivariate 
normal conditional distribution with one or more multinomial conditioning vec- 
tors. He considers cases more general than ours, and employs the asymptotic 
chi-square property of —2 log \ to perform his tests; correlation is not mentioned. 





450 I. OLKIN AND R. F. TATE 


3. Relations between correlation coefficients and means. 
3.1. Model and preliminaries. Consider the model’ 


( = 
cag sss Up | Tm “28 1,2, =a 0,v # m= 0, 1, , , k) ~N(w”’, 2), 
and suppose that the conditional means and covariances are given as follows: 


Means Vectors 


Zo Zi 


oS 


Yj Mio Mi tt) Mlk wp = (uo, °° » Mpo) 


. } e 
Yp | Mp0 Mpl *** Mpk wo) = (win, *** 5 Mk) 
Covariances 


Y1 eo). aa % i 


Yi Vir ***) Wip 510 ‘bu 

os : : ‘ 

Up | dot aie tor by0 bp: 

> 510 - 5p “Yoo You 

Ma es wan Bye _ ve 
The unconditional moments are: 


. - 
(3.1) Ey; a E(y:\ tn = 1)p. = ZZ HimPm = Hi. , 


m=(0 m=0 


Eyy; = > E(yys| tm = 1) pm = YD (655 + pimbtjm) Dm 
m=( 


(3.2) " 
= CG + = DmbMimM jm - 


Hence, 


(3.3) Vij = Cis + > Dm( jim = Bi.) (jm —_ By-), 


(3.4) Sim = Dm( Mim a Mi), 


(35) Yau ™ Pale; Ynr = —DmP>; (m # v), Qu = 1 — Da. 
If we let U = (Wim) = (Him — Mi-), t= 1,2,---,p,m=0,1,---,k; p= 
(po, Pi, °** » Pe), Dp = diag (po, pi, °** , Pe), then (3.3)-(3.5) can be written 


as 

(3.6) v= 2+ UD,U’, 

(3.7) 4 = UD,, 

(3.8) r = D, — p’p. 

Note that Ae’ = Te’ = 0, where e = (1, 1,---,1): 1% k + 1. Moreover, 


3 ~ F(z) means that z is distributed according to the d.f. F(z), and z(n) — F(z) means 
that the asymptotic d.f. of x(n) is F(z). 





DISCRETE-CONTINUOUS CORRELATION 


Tw’ = 0 if and only if w is a scalar multiple of e. Finally, Up’ = UD,e 
0, and from (3.7) and (3.8) we have 


(3.9) A = UT. 


3.2. Relations between canonical correlations and means. Consider the matrix 


Vy, = (“3 A ) 
A’ —rr/}° 
The canonical correlations (introduced by Hotelling [6]) are defined as the num- 
bers \ to each of which corresponds a non-trivial vector ¢ = (», —) = (m, m, 
-, 1p 3%, %,°°* , &), with Vit’ = 0. Since Ae’ = Te’ = 0, will be trivial if 
n = 0 and £ is a scalar multiple of e. Thus, if \ ~ 0, then d is a canonical cor- 
relation if there exist vectors n ~ 0 and & such that 


(3.10) At! = Wn’, A’n’ = ATY’. 


Note that if 7 = 0, then Te’ = 0, in which case ¢ would be a scalar multiple of e. 
Lemma 3.1. The non-zero canonical correlations are precisely the non-zero roots of 


(3.11) \UD,U’ — 62| = 0, @=)/(1 — ’). 

Remark. Since ¥ = UD,U’ + & is positive definite, @ ~ —1, and ” = 6/ 
(1 + @) is well-defined. 

Proor. Given |V,| = 0, with A # 0, 7 ¥ 0, and using (3.6), (3.7), (3.9), and 
(3.10), we have 

\7UD,U’ = X'UA'y! = UT?’ = (UD,U' + =)q’, 
which implies (3.11). 

Conversely, suppose that (3.11) holds with 6 # 0. Then there exists » ~ 0 
such that 

(1 — »”)UD,U'n! = WE’. 
It now suffices to prove that there exists a vector & satisfying 
re’ = N'A’, 

for then At’ = \Wn’ by an argument similar to the above. But such a vector 
does exist, and is given by — = e; in which case Te’ = 0 and eA’n’ = 0 for all 
n 


THEOREM 3.2. The canonical correlations are zero tf and only if 


a =p cs tae * adh 
Proor. Clearly wim = wi» for all 7, m, and »v holds if and only if U = 0. If 
U = 0, then (3.11) implies that 6 = 0. Conversely, if @ = 0, then > UD,U'="* 
= 0, so that >*UD} = 0, which in turn implies that U = 0. ||. 
Ifk > 1, p = 1, then U = (yy — m.,°** , Hue — w.), and there is only one 
non-zero root: namely, the multiple correlation coefficient. (The first subscript 





452 I. OLKIN AND R. F. TATE 


is not needed in this discussion, and we omit it.) Hence, 


k 


>» (in — HM) Dm O11 
‘ : 2 m= 
(3.12) = Py(2x; se a an 


rk k 
1 + is (je KL) Dm Cu 


m=\) 


. . j k 2 
If we define A, = (um — w)(oy) *, andé = > PmAm then 


(3.13) po =>, pads / (1 +> pa Si) = 6/(1 + 4). 
Also, 


Dm Am 
Gn(1 + 8)” 
so that 


k 


(3.14) po = Z QmPom ‘ 


0 


The multiple correlation between y and a subset (21, --- 
may also be computed: namely 


2 
Py(21,°+* 21) 


(3.15) . 2 -1 
= (> Pm ( im — bh) + [> Pm bm = »» | [ “3 >. ps | ) / vn. 
1 


)) 


Now suppose k = 1, p > 1; then U = (p,d’, —pod’), where d = (sh? — gy). 
Hence, UD,U' = popd’d, and (3.11) has one non-zero root, @ = popd= ‘ad’, so 
that \” = 6/(1 + @) is equal to 


(0 W\a-ls (© (1) 
(3.16) 2 _ Pru —w )> (uw —w )’ 
o. Pzi(yi."**yp) = 1 + po ri(h oe pe) S(O pi po)!" 
We now find conditions for which the partial correlation coefficient vanishes. 
+n os 2 2 
THEOREM 3.3. Let k > 1, p = 1 and pocm4t). = Pytmai-(21.°**.2m) 3 REN porm41). =O 
if and only if 
k k 
(3.17) = 2) Pate! 2, Po. 
m+2 m+2 


Proor. From 


i- pou ‘s+,m,m+1) = {1 > pou,---.m)]{ a pocm+t)-]; 


we have that pocms1). = 0 if and only if Gac.->-.0s) * fen, > Om (3.19) this 
condition holds if and only if 


| Dr( My — ») | ho Pr by — ») | 


0 


Dm+i( Mm-+1 _ yu)? + . —————— ———— - = Q. 


m+1 m 


1-—>  p, 1-dip, 


0 0 





DISCRETE-CONTINUOUS CORRELATION 


Simplification yields 


(1m ad ek E age > | » Py\ by — »)) = 0 


which is equivalent to (3.17). 

3.3. Vector correlation. If we regard correlation in our model as merely a meas- 
ure of dispersion for the various N(yw”’, Z) distributions, then we are led quite 
naturally to a consideration of vector correlation. This concept is due to Wilks 
[15], and is an extension of the correlation ratio to the multivariate case via the 
use of generalized variance. In the notation of our model the coefficient of vector 
correlation p- can be expressed as 


saa 
(3.18) pee ES ees 
|\> + UD, U’| 
It is easy to see that (3.18) reduces to (3.12) and (3.16) fork > 1, p = 1 
and k = 1, p > 1, respectively. 
4. Distribution theory for the case k = 1, p > 1. Let (Yia, «++ , Ypa , Loa, Tia), 
a = 1,---,n, be n independent random vectors, with conditional distribution 
(thay *** > Ypaltma = 1) ~ R(u'”, Z), m = O, 1. It will be convenient to define 


the following statistics: 


no = > Zee; n= > Ne; n=nm +n, 
a a 


. -(0 . ~ (1) J 
os. = i Yia/N, Yj ie > YjaXa/ MN , les 2 Yiatia/M™ , 
a a 


a 


~ (0) 1) - (1) -(1 ’ “ 
*, Up); 9 = (91 ,°°* Gs); S = (85): p X p, 


n 
(m) -—(™ (m) =—( Mm 6 . . 
85 = DD yk — W)C — H")/(n—- 2), if =1,---,p, 
m=0 \=1 
where { yx” } is the subset of (ya, --- , yim) for which the corresponding elements 
Of (2m, °** » 2mn) are equal to unity. 


‘ ° ‘ . ° 2 2 
Corresponding to (3.16) we have as an estimate of p = pzjiy,,---.y,) 5 


: . _ (non /n(n — 2))(g™ —- 9”) S*(g" — 9)’ 


Pr = Pa,(y, 0° 


(4.1) vp) ~~ l + (non/n(n a 2)) (g®— 9) Sg aes gy)’ 
T’/(n —2+ 7") 





where 


-(1 


T’ = (non,/n) (9? — 9g) Sg” — Gg)’. 


We can now state the following 
THEorEM 4.1: Q = [(n — p — 1)/p\[r°/(1 — r°)] ts distributed as a mixture 
of noncentral F'p,»->-1(7') distributions, with mixing coefficients 


n no my 
(") Por 





454 I. OLKIN AND R. F. TATE 


2 

2 Noni p 
T= . noe Be 
NPopr (; - ) 


Proor: First note that from (4.1) 


(= jaae=! oe ( rot ss 
l-r p n—2 p : 


The conditional distribution of this statistic can be obtained immediately by 
applying a method of Bowker’s (see Anderson [1], Theorem 5.2.2): If 7? = 
YS'Y’, where Y ~ MN(v, 2), =: p X p, andaS = )-{$Z.Z,, where Z, ~ 
9(0, 2), then 


and parameter 


7 2 
- ) ios Foot? ), 


2 y—1 T ‘ r , 
where r = v= »v’. Now, make the correspondence a = n — 2, Y = (non,/n)* 
-(1) -0 / ; (1) (0) ‘ ’ 2 . : 
(9° —9),v = (mm/n)*(u” — uw’), and use (3.16) to compute 7°. Forming 


the mixture, we have 


> ¥ enon e?/npops (1—p?) Ny 1 Zz h n 
f(Q) = — — ith seit opr. || 
JQ S&S hi onp pl — PP fp+2h,n—p—1 No Po Pr -|| 
Note that if p’ = 0, 7” has the T’ distribution of Hotelling [5]; and, since the 
no and mn, sum out, Q has an ordinary F,,,~»-1 distribution. 
4.1. Asymptotic distribution. Define 


‘ r Dp 
WF sro n—-p-— !1 


h(r) = {g(r)/{l + g(r)}}' = r. Then 


(4.2) h(r) —- N (ro), : (¢ ) lim nV(g(r)). 


n Cc J no 


| pe” 4X 


We now determine the various factors: 


2 2, ih| \? 
h(p) = p. g(p) = p/(1 — p), (2 ) 
dg p 


= [49(p)(1 + g(p))*J* = (1 — p*)*/4p’, 


(4.3) add) « <% (1+), 


d—2 Cc 


' aj  a&(c+2) Er’ + 2(¢ + ze] 
(44) E[F.a(7°)) = dd — 2a — al! + a. 


Ex’ = [np pill — p°)|'p' Eno m = (n — 1)9(p), 
Er* = (npy p:1)°g' (p) Enini 
(n — 1)g°(p)[(n — 2)(n — 3)po pi + 2 — 1]/npp pr. 





DISCRETE-CONTINUOUS CORRELATION 
With c = p,d = n — p — 1, after considerable calculation we obtain 


: 1-2 
lim nV(g(r)) = g(p) L—2P0P + 4g(p), 
n> Po Pi 
which upon substitution in (4.2) yields 

THEOREM 4.2: 


(4.5) r—> 0 (. *n- aD a= Ua - ey), 
NPo Pr 

It is rather surprising that the asymptotic variance is independent of p, the 
number of variates in (yj; , --- , Yp), except insofar as it affects p = i railitinoaa : 
As a consequence, (4.5) is identical in form with the result of Tate ([{13], Th. 1) 
for p = 1. Thus, we can apply some of the results of that paper. In particular 
V..(r) has a minimum for each p when po = pi = 3, in which case V,(r) = 
(1 — p’)*(2 — p°)/2n. By a variance stabilizing transformation we obtain 
(when po = 4) 


tanh '[r°(2 — r*)]' ~ ot{tanh “[p"(2 — 9”)]', 2/n}. 


In a recent paper, Hooper [4] considered the following model. Let 
(Ya; Tia, ***,Lha), @ = 1, +++, mn, be n independent random vectors, with 


Ya = 2 mtrat Ua, Dra = bra t ra; AN=1,---,A; teen, 


where &. are real numbers, (wia,-** , @Aa) are independent observations from 
a A-variate normal distribution with zero mean, independent of u.,a = 1,---, 
n, which are independent normal variates with zero means. If >> & = 1, then 
the asymptotic variance of the multiple correlation coefficient is (1 — p°)’ 
(2 — p’)/2n, which is the same as V..(r) when po = p; = 3. Although the results 
are the same, the connection, if any, between the two models is obscure. 


5. Distribution theory for the Case k > 1, p = 1. Let (ya, Don, *** , Lia), 
a = 1,---,n, ben independent random vectors, where the conditional distribu- 
tion of (Yaltma = 1) ~ N(um,o), m = 0,1, --- , k. It will be convenient to 
define the following statistics: 


k 
Nm Mans n= LM » j=. > Ya/n, 
a 


nm 
gj” Z Yalma/Nm - 2 yr” /m ’ 
a —= 


where {y\”} is the subset of (y:, --- , yn) for which the corresponding elements 
Of (Xm, °** , mn) are equal to unity. 


LY - az Yalma/N, Zn = >, tna/N, 


k 
= 22 Pmitm Am = (pm = u)/o, 6 





456 I. OLKIN AND R. F. TATE 


5.1. Multiple Correlation Coefficient: Exact distribution. Corresponding to (3.12), 
(3.13), (3.14), we have several equivalent forms for the estimator of pe : 


k 
9 9 9 
(5.1) To & Ty(2,,---.) = n> $,,/Nad » 


m=( 


2 =\3/ << a 
where s = } eee 9)" /n, & = (1/n) doa (Ya — §)(fma — 


= Nn(Y — y)/n, 


2 


k 
. = =2\—1 —— 
(5.2) ce = > > (LmY — LnY) [Im 
0 


k 
(5.3) ro 2 is — Xm)Tom, 
0 


where Tom = Tyzm - 
Using (5.1) we obtain 


k k 


Zz. nly” — G) 2. Nm G” 


0 0 


Nm 


— — =_— = k ——— 
=\2 -(m) = \2 m 
DY (ye - 9)? — VnalG9™ — 9) DD (y” - 
1 0 m=) \=] 

In view of the above we have, analogous to Theorem 4.1, 

TuHeoreM 5.1. The statistic 


» {a - k-1 re 
ruta) 4, 


; : ‘ : ’ 2 ‘ : ' . — 
is distributed as a mixture of noncentral Fy. n-x-1(7°) distributions, with mixing 
coefficients 


nm. 
no my 
’ eo , Po Pi 
Nig Ny: ° TS 


and parameter 7° = >) NmAn. 

Proor. Follow the same type of argument as in the proof of Theorem 4.1. 
Note that again for this case we have 7’ = 0 whenever po = 0, and hence Z ~ 
F..n-k-1 , Which is a well-known result (see Fisher [3]). 

5.5.1 Asymptotic Distribution. We now need certain computations for moments. 
Using 


(_k 
>> pmE(y'\tm = 1), if a=b=0, 
= ¢ 0 


> a b 
Ex32,y° = 


\pmll(y*|am = 1), m=y 





DISCRETE-CONTINUOUS CORRELATION 


we find that 
V(tm) = Dadm; Cov (tm, 2») = —Dmp.(m # v), 
Viy) =1+6, Vy’) = Lipman — & + 46 + 2, 
Cov (y, y*) = Di pmAn,  V(xmy) = pm(l + qmAd), 
Cov (tm, Za¥) = Dadude; Cov (2m, 2+Y) = —PmprAn(m # v), 
Cov (am, y) a Cov (2m, y') = Pm(An — 8), 
Cov (XmY, Y) = Pm(1l + Ai.), Cov (mY, 2Y) = —PmpPrAnA,(m # v), 
Cov (amy, ¥) = DmAm( Ax, — 8 + 2). 


If we write rs as in (5.2), we have a function of sample moments, and we can 
expand about the population moments (see Cramér [2], p. 353). This leads to 
the following asymptotic result. 

THEOREM 5.2. 


f. «> x ( I fabete tS) i) 
0 * Po» , 4a(1 = 83 


Alternative forms for the asymptotic variance are 
2\2 4 ‘ 
— po) m As, $4 
Po 7. + 1 —_ 4 “| ’ 


Po 


n 


(4 = Pv 2 onm | Dm ‘ 
V(r) = (1 = pe) i. = +1i-— il, 


4p? 4 


. 2 2 2\ / in : ° 
since pom = Pmdm(l — po)/qn. The term > Gaom/ Dm contains the nuisance 
parameters; we can, however, look at some bounds. We find that 


- 24 - J 
ps S Dm >, GaPom & pol, Pm — 2k — 1). 


The right inequality follows from pom < po and the left inequality follows from 
the Cauchy inequality 
> [(GmPom) Pm (GmPom)] 2, Dm 2 (2, JmPom) = po- 
Thus, 
(1 — po)(2 — 90) < yg Um) | 


2n es 2n 


2 k 
Po 1 
TR ce Ee oe BDF. 
| + 30 —— 0 ( )| 


For fixed p> and k the right-hand side takes on its minimum value of 


2\2 27,2 _ 
(1 — po) (2 oi pi) l re po(k | 


2n 2(2 — p2) 





458 I. 


when pn = 1/(kK + 1 
the result of Tate [13], when k = 1. 


OLKIN AND R. 


F. TATE 


), which in turn of course reduces to the left-hand side, and 


5.2. Partial correlation inaeties Exact distribution. We first find an expression 


"es Te¢m-41)- _ Teena: (z0,° 
THEOREM 5.3. 


:Zm) 


2 
T0(m+1)- 


ee» a n,/ yn \ 9 pr x Go)” 





2 a kon 
1 — T0(m+1)- 


yeol dm] 
- k —( 
where jo = Domi Mf” / Dome Ny - 
Proor. From 1 — rocm4s- 
ple estimator of (3.15): 


,= [Saw 


2 
T0(0,1,+++,m 


we obtain 


Tocm+i)- 1 
hm (GO — GF 
1 —Toumt): 


0 


SUE sy 


and finally 


__ Fem: 


¥ > (yf? — 


2 
= (1 — 1T0(0,1,++- 


gj”)? + a n(g” — Go)” 


ym+1))/[1 — 70(0.1,-++,m)]> and the sam- 


—g) + (: - X p.) £ bg” — y|/* 


( m+1 
ial 
wee oP a. 


(9 — 9 








= To (m+) - 


using the relation ot’ #,(g” — g) = 


k ny, 
Dd (yw - 


y=) A=] 


Simplifying, 


m+2 


k Ny 
=) L(y - 


vex) A=] 


(5.6) 


— Linas BG 
m+1 2 
nD =X (ve ~ 9) — X mig” — 9) —|¥ mtg” - 9) | 


. 
gp”) + alg” — 9) - 


a 


g”) > yn (g” 


— gj). In the above 


k 
dM 
m+2 
m+1 


2 nig” — 9) 


gj)? 





DISCRETE-CONTINUOUS CORRELATION 


Substitution of j, = n,/n and (5.6) into (5.5) leads to the result. 
In particular, if m + 1 = k — 1, then, 


( Nke—-1 Ne yo -(k—1) — 9”) 
(5.7) (n — k = 1)'rea— sez. Nei + Mk 


1 = ta S eae gh eee 
Et (EE ot Pie — 2-0) 


v=) A= 


has a Student’s t-distribution with n — k — 1 degrees of freedom, when p = pos). 
0, and a mixture of non-central t,_,-.-distributions with parameter 


“GE 
1 — p Pr—1 Pk Nea + Me : 


and mixing coefficients 


( 7 9, PRP ‘me *(1 — a — 
Ne-1 


Note that p = 0 if and only if wii = me. 

5.2.1. Asymptotic distribution. The asymptotic distribution of roa). = r will 
be obtained by an argument paralleling that of Theorem 4.1. Let 

; 1 
i--f wa—b—i 
hr) = {g(r)/lL + g(r)]}' =r 


then the limiting distribution of h(r) is given by (4.2). As before we find 
limy..2V(g(r)) = lima. nd °V(F;,a(7°)), using (4.3) and (4.4) with ¢ = 1, 
d = n — k — 1. However, now the definition of the non-centrality parameter 


g(r) = F,, n—k—1) 


(5.8) = g(p) Bryn. / a (Mp » +m), a = Pr-1Px » B = Pert De; 


is different. 
Lemna 5.4. 


= g(p)[n — 1/8}. 
PROOF. 
E{nyyne/ (Mea + m)) = Elrgng|( a + m%) = m|) 

= Em8™'pra(l — 8 'pea) = of *(Em — 1). 
Since m ~ b(n, 8B), Em = 


Lemma 5.5. 

2 

Er‘ = ng (p) | « — 1)aB+ @ —5a+0 (*)]. 
aB n 


Proor. Let zt = n/n, y = m/n, b(z, y) = xy/(x + y)’, then 
2-22 


Er‘ = n’6’a *g’(p)Eb(x, y). Now expand b(z, y) in a Taylor’s series about 





460 I. OLKIN AND R. F. TATE 
(pe-1, Pe) to second degree terms. We have b, = 2xy’/(x + y)*, bas = 2y° 
(y — 2x)/(a + y)*, by = 6a°y’/(x + y)*. After simplification 
4, 2 3 ‘ 2 ‘ 2 Ss 
b(2, y) = B [x pe(pe — Zea) + y Pia (Di . — 2p.) + 6a’ x’ y’| + R. 


Since Ex” = [(m — 1)pi-a + pral/n, Exy = (n — 1)a/n, ER = O(1/n’), we 
obtain 


Eb(x, y) = alaB(n — 1) + & — 5a + O(1/n)]/np’. 


THEOREM 5.6. 


(1 — p) . (f — 3a6 — ss) ) 
Mk—1) I ) - l+p - ; 
O-8- : (. n | . 4a8 


Proor. Using (4.3), (4.4), Lemmas 1, 2, 


Vig(r)) =d V(Fy4) = [8(d — 2) + Er + 6Er — (d — 4)(1 + Er’)*|/ 


(d — 2)*(d — 4) = [2n’g’(p) + n(d — 2)9°(p) 
(—aB + 6° — 5a)/aB + 4n(d — 1)g(p) + 2n(d — 4) 
g°(p)/B + O(n)\/(d — 2)*(d — 4) 
Recall that d = n — k — 1, so that 


lim nV(g(r)) = g'(p)(aB + B — 3a)/a8 + 4g9(p). 
The remaining computations follow directly from (4.2). 


Remarks. The parameter p may be removed from the variance by the variance 
stabilizing transformation $(x) which satisfies the equation 


(rz) = [1 —2)(1 + 2)", cc = (8 — 308 — 3a)/(4a8). 


The desired solution is 


_,2(e+1)! 


(5.9) ¢(z) = - ata 


1 
(c + 1) 


tan 
or, equivalently, 


1 _, 2a[(e + 1)(1 + ex’)}' 
—— tanh ———_—_— 
2(c + 1)! 1 + (2c + 1)2? 
If pra = pe = 3, thene = —}, and ¢(z) = 2' tanh '2(2 — 2”) = 2' tanh™ 
[1 — (1 — 2°)’, which coincides with the result of [13]. In general, then, we 
obtain 
THEOREM 5.7. 


-1 Tory. (1+ ¢)* ( -1p(1 +c)? 14 ‘) 
_ (1 + erzu—y.)! at, a (1+ cp*)!’ n . 


with c = (6 — 3a8 — 3a)/4a8, a = Pr-ipPr » B = Pr-r + Mr - 


(5.10) o(xz) = 





DISCRETE-CONTINUOUS CORRELATION 


6. Caseek > 1,p> 1. 
6.1. Remarks on sample canonical correlations. Define bim = (gs 9:), 


B = (bin): p Xk +1, D = diag (no/n, --- , n/n) 
k tm 
oi = De ee, (uk > a”) (yx re 9;"), S = (8i;):p Xx P; 

hi; = doo Mmbimbjm/n, H = (hij):p X p. Then H = BDB’ is an estimate of 
UD,U’, and S/n is an estimate of 2, so that we are interested in the distribution 
of the roots of |H — @S/n| = 0 or equivalently the roots of is*as? — 6/n| = 0. 
Even for the simplest model with uw” = --- = uw”, > = I, and ny ttt, Mm 
fixed, this problem remains unsolved. The reduction of the following paragraph 
will serve to focus attention on the difficulties. 


Let B ~ N(0, J), S ~ Wishart (J, p, n), that is 
(6.1) p(B, S) = const |S|""?-” exp [—4 tr (BB’ + S)]. 
Let L = S*BD'. The Jacobian is |S|***?|D|~?”, and 
p(L, S) = const. |S|‘"~?*”” exp [—3 tr S(LD"'L’ + 1)). 
Integration over the domain S > 0 yields 


(6.2) p(L) = const. |D\-??|7 + LDL" ***)" 


= const. |D|°**"?*"|D + L'L'***tP?, 

Our concern is then to obtain the distribution of the characteristic roots of LL’. 

Except for the cases k = 1 or p = 1, this problem is untractable, for it involves 

the evaluation of integrals of the type f g(T)|'D,T’ — D,.| “dT over the domain 

rr’ = J, where D, , D, are diagonal matrices, and g(T) is a function of T alone; 
it is the determinant in the integral which causes the problem. 

6.2. The sample vector correlation. The sample counterpart of (3.18), namely 


_ _|S/r| 
[H+ S/n|’ 
using the notation of Section 6.1, is called the sample vector correlation coeffi- 
cient. For the case p; > 0 nothing is known of the distribution. It would of course 
be some kind of mixture, and would in general contain nuisance parameters. 
Under the null hypothesis it is known that 1 — n~ Usba-b-3; 
where U,4,n-x-1 is a U-statistic of Wilks (see Anderson [1], ch. 9.7). The exact 
distribution is available for all p, k values such that at least one of the two quanti- 
ties is 1, 2, or 3. For the case p = k = 3 a table is available for small values 
of n (Anderson [1], ch. 8.5). For larger sample values the asymptotic distribution 
of Rao [10] can be adapted to our situation. Denote by P,(z) the distribution 
function of the random variable z evaluated at z. Then 


(6.3) r= 


(6.4) P yptoga—- (2) = P,2,(z) aa = {Px2, (2) ~ Prx2 (2) | + O(n’), 





462 I. OLKIN AND R. F. TATE 


where 
yv=n—}(pt+k-+3), Y= (pk/48)(p’ + k — 5). 


7. Estimation of parameters. It has been tacitly assumed throughout the paper 
that the parameters u;,, and o;; may be estimated by their corresponding sample 
means, and the parameters p,, by their corresponding relative frequencies, re- 
gardless of any dependence which might exist between the random vectors y. 
and z,. The purpose of this section is to show that the assumption is valid if 
the method of maximum likelihood is used. 

Let h(ya, Za) be the density of the random vector (Yie, Yea, ***» Ya; 
Loa, Tia, *** » Tea), and f(x.) the density of (20a, Tia, *** » Tea). Also, denote 
by ¢m(Yo) the density of a 2(u‘”, 2) random vector; recall that u‘” is a column 
vector of the matrix (uim). According to our model 


k 
(7.1) f(te) = po -++ pi,  h(yalte) = DL, Tmohm(Ya)- 


Therefore, the density of the whole sample is 
(7.2) IT (ye, a) ™ IL (35 2mebm(Ya)) po™ ae pi 


Now change the a-labels in such a way that they start with the a-values for 
which 24 = 1, then those for which z;. = 1; more precisely, assume 


m—1 
Sain 4, a=A+ dn, A bs a tes 
0 
Then, let yx” = (yx, ---, ySe) be independent X(u°”, =) random vectors. 
It is now easy to see that (7.2) becomes 
kon 


(7.3) Il h(Ya » ta) = po° Smee pe'TT I dm( yr” . 


Thus, the joint density is factorable; one factor contains the parameters 
Po, Pi,*** » Pe ; the other factors contain the parameters yin and oj; . 


8. Summary of procedures, with examples. 

8.1 Caseep>1,k = 1. 

(i) etiam may be computed from (4.1). 

(ii) The hypothesis H: p2,:y,-- -vp) = 0 with other parameters arbitrary may be 
tested by the Q-statistic of Theorem 4.1: Reject if Q = c, where c is obtained 
from a table of the F-distribution. 

(iii) Examples of all cases are provided by some of the studies described by 
Rokeach [11]. In experiments concerning attitudes individuals were classified as 
Northerners or Southerners; according to membership in the British political 
parties: Liberals, Conservatives, Laborites, or Communists; according to re- 
ligious affiliation. The individuals also received scores on Dogmatism and Opin- 
ionation scales. Thus, he considers situations in which p = 2. For illustrative 
purposes, we have chosen a restricted set of data from one of his studies. 





DISCRETE-CONTINUOUS CORRELATION 


y2 yi y2 
133 143 

156 159 135 
151 168 198 
136 192 169 
164 168 117 
1 142 138 


Lo 1 for a Northerner; zx, = 1 for a Southerner. 
Yi dogmatism score; y2 = opinionation score. 


For this example n = 12, mo = 5, m = 7. The formula g{” = oe Yiatma/Nm, 
and the expression for s,;, given at the beginning of Section 4, gives 


gi” = 175.80, 9°” = 151.00, }? = 57 gs” = 148.29 
8 = 553.76, 82 = 180.87, 8x2 = 471.46. 


Whence, s” = .0020645, s” = —.0007920, s” = .0024249, and 7” — 
(—23.23, —2.71). Then, 
T? = (no,/n)(g” — 9) S"(g — 9)’ = 3.010, 
and 
r = T’/(10 + T°) = 2314, sr = .481. 


Finally, Q = 1.355. Referring to the F-table for 2 and 9 degrees of freedom, 
we see that a Q-value of 9.38, and hence an r° value of .6757, is required for 
significance. Thus, we accept the hypothesis that there is no particular associa- 
tion between Dogmatism and Opinionation (as defined by Rokeach) on the one 
hand, and the section of the country in which the individual lives on the other 
hand. 

(iv) In order to obtain the power of the test in part (ii) for a specified alterna- 
tive (po, Pi, p) We may consult the tables of Tang [12] or the charts of Pearson 
and Hartley [9]. The quantity 7° of Theorem 4.1 must be computed for po , 1 , p, 
and each possible partition n = mo + n,. The probability of a Type II error is 
then obtained as a mixture of various values of Py; from the tables or charts. A 
calculation of this kind is given by Tate ({14], p. 1083). 

(v) When po , 7: are known, we can test the hypothesis H:p’ = constant (not 
necessarily zero) by using the distribution (4.5) of Theorem 4.2. An alternative 
to this is to make a variance stabilizing transformation @ (this is discussed in 
Section 5.2.1 in connection with partial correlation), and then test H:¢(p’) = 
¢(constant). For the case at hand we have 


= (A4pyp)*(1 — 2pop,)* tanh 'r(1 — 2pop,)*{4pop. + (1 — 2pop,)r°t* 


and, of course, ¢(r) — MN (¢(p), 1/n). 

(vi) Confidence limits for p can be obtained when pp and p; are known by first 
finding confidence limits for ¢(p) and then obtaining from them the limits for p. 
See Tate [13] for an example with p = 1, k = 1, po = ~1 = 3. 

8.2. Case p = 1,k > 1. 





464 I. OLKIN AND R. F. TATE 


(i) 75 = roce,,---25) May be computed from (5.1), (5.2), or (5.3). 

(ii) The hypothesis H:p) = 0 with other parameters arbitrary may be tested 
by the Z-statistic of Theorem 5.1. 

(iii) The asymptotic distribution of ro is given by Theorem 5.2. Note that 
nuisance parameters are present whatever the values of po, --- , p, (recall that 
they must all be positive). To test H:p} = constant (not necessarily zero) we 
can use the asymptotic distribution, with estimates for the nuisance parameters 
in any form of the asymptotic variance which is convenient to use. The alterna- 
tive to this is to use the lower bound (1 — po)?(2 — po)/2n and reject too often 
when H is true, or use the upper bound (1 — ps)"(2 — po)[1 + 4p0(2 — po) 
- $96 (px — 2)]/2n and reject too rarely when H is true. 

(iv) The remarks of (iii) apply also to the determination of a confidence 
interval for pp. The upper bound for the asymptotic variance would lead to a 
conservative confidence interval worthy of more confidence than we place on it. 

8.3 Casep >1,k > 1. 

(i) re-is computed from (6.3), using the notation of Section 6.1. 

(ii) The hypothesis H:p, = 0 with other parameters arbitrary is tested by 
the statistic 1 — r? which has the U»x«.n—-e-1 distribution under H. See the remarks 
in Section 6.2 concerning the availability and scope of tables, and Rao’s form 
for the distribution function of the transformed variate —(n — $(p + k + 3)) 
log (1 — rz). 


9. Acknowledgment. The authors are indebted to Professor Milton Rokeach, 
Michigan State University, for making his data available, and to the referee for 
the present proofs of Lemma 3.1 and Theorem 3.2. 


REFERENCES 


[1] T. W. AnpgERson, An Introduction to Multivariate Statistical Analysis, John Wiley 
and Sons, New York, 1958. 

[2] Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, New Jersey, 1946. 

[3] R. A. Fisner, ‘“‘On a distribution yielding the error functions of several well-known 
statistics,’’ Proc. Int. Math. Congress, Toronto (1924), pp. 805-813. 

[4] Joun W. Hooper, ‘‘The sampling variance of correlation coefficients under assump- 
tions of fixed and mixed variates,’’ Biometrika, Vol. 45 (1958), pp. 471-477. 

[5] Harotp HorTeiine, ‘The generalization of Student’s ratio,’’ Ann. Math. Stat., 
Vol. 2 (1931), pp. 360-378. 

[6] Harotp HoTe.uina, ‘‘Relations between two sets of variates,’’ Biometrika, Vol. 28 
(1936), pp. 321-377. 

[7] P. C. Manaanosis, ‘‘On the generalized distance in statistics,’’ Proc. Nat. Inst. Sci. 
India, Vol. 12 (1936), pp. 49-55. 

[8] M. D. Movustara, ‘‘Tests of hypotheses on a multivariate population, some of the 
variables being continuous and the rest catagorical,’’ Institute of Statistics 
Mimeograph Series No. 179, Chapel Hill, North Carolina, 1957. 

{9] E. S. Pearson anv H. O. Hart ey, ‘‘Charts of the power function for analysis of 
variance tests, derived from the noncentral F-distribution,’’ Biometrika, Vol. 38 
(1951), pp. 112-130. 





DISCRETE-CONTINUOUS CORRELATION 465 


{10} C. RapHakrisHna Rao, ‘‘Tests of significance in multivariate analysis,’’ Biometrika, 
Vol. 35 (1948), pp. 58-79. 

{11] Mritton Roxeacu, The Open and Closed Mind, Basic Books, New York, 1960. 

[12] P. C. Tana, ‘“‘The power function of the analysis of variance tests with tables and illus- 
trations of their use,’’ Stat. Res. Memoirs, Vol. 2 (1938), pp. 126-149 and tables. 

[13] R. F. Tats, “Correlation between a discrete and a continuous variable,’”’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 603-607. 

[14] R. F. Tats, “Applications of correlation models for biserial data,’’ J. Amer. Stat. 
Assn., Vol. 50 (1955), pp. 1078-1095. 

[15] S. 8. Wriks, ‘‘On the independence of k sets of normally distributed statistical varia- 
bles,’’ Econometrica, Vol. 3 (1935), pp. 309-326. 





LIMITS FOR A VARIANCE COMPONENT WITH AN EXACT 
CONFIDENCE COEFFICIENT 


By W. C. Heaty, Jr. 
Ethyl Corporation, Detroit 


1. Introduction and summary. This paper deals with confidence interval esti- 
mation and hypothesis testing for components of variance, in analysis-of-variance 
situations embraced by the Model II of Eisenhart [1], including also the so- 
called nested classifications. Several authors have treated the problem of setting 
confidence limits for variance components, and several approximate methods 
have been proposed. Four approximate methods are described in Anderson and 
Bancroft [2] and briefly in Crump [3], and references to original sources and ex- 
tensive bibliographies are given in both [2] and [3]. In [4], Green gives more 
refined approximations which are, however, not presently in a form for practical 
use. Huitson [5], Welch [6] and Cochran [7] discuss related problems involving 
linear combinations of variances, and offer approximate methods for these prob- 
lems. The many references on approximate tests and confidence limits in variance 
component problems emphasize the absence of exact methods. The present paper 
points out that, using a randomization device, exact confidence limits and tests 
for a variance component become available in a simple way. These exact confi- 
dence limits will usually but not always define a single confidence interval; in the 
exceptional case (having small probability in practice) the exact limits may de- 
fine an interval with a gap in it. Numerical illustrations are given, together with 
comparisons with results using some of the available approximate methods. 
Also, asymptotic power comparisons between the exact test and two approximate 
tests are discussed. 

There are at least three notions of confidence that can be associated with the 
statement: “a(z) S @ S b(x) is a 100(1 — a)% confidence interval” for the 
parameter @ with possible nuisance parameters , based on observations zx. 
They are 


(a) Pri{a(z) S$ 6S b(z)} 1 —a_ for all 6, n 

(b) Pria(z) 6s b(x)} 2 1 — a@ for all 6, » with equality for some @, 7. 

(c) Prf{a(zx) 6s b(x)} 21—a forall @, n. 
The phrase “exact confidence” we shall interpret in the sense of (a) above. So 
far as the author is aware, for a variance component no confidence limits satis- 
fying either of the notions (a) or (b) have previously been constructed. An inter- 
val satisfying (c) has been constructed in [8], using a two-stage sampling pro- 
cedure. 

In the present approach, the mathematical difficulties ordinarily caused in 


Received June 13, 1957; re vised October 21, 1960. 


466 





VARIANCE COMPONENT CONFIDENCE LIMITS 467 


variance component analysis by the presence of nuisance parameters are circum- 
vented by a process of randomization. The resulting confidence limits depend 
not only on the mean squares of the analysis of variance table, but also on 
auxiliary observations on a random variable with known normal distribution. A 
consequence is that two statisticians confronted with the same analysis of vari- 
ance table will in general construct different confidence limits. While practically 
this may afford some discomfiture, it remains true nevertheless that these exact 
confidence limits meet the ordinary claim (a) as to probability of containing the 
true variance component. In the examples tried, the limits are plausible and 
they are not difficult to compute. Moreover, the agreement between numerical 
results using the method proposed herein and the usual approximate methods 
may serve to increase one’s faith in the approximate methods in small samples. 


2. Statement of the problem. Specifically, the kind of problem dealt with here 
can be described in terms of two observed mean squares U and V such that 
nV/o and mU/(o” + roo) are independently distributed in chi-square distribu- 
tions with n and m degrees of freedom respectively. The variance components 
o and oo are unknown, and r is a known constant depending on the experimental 
design. It is required to find confidence limits for o5 having confidence coeffi- 
cient exactly 1 — a. This problem arises from balanced Model II variance 
analyses, from certain Mixed Model analyses, and from analyses of nested 
classifications, when suitable normality assumptions are made. For full discus- 
sions of these analyses and models, the reader is referred to [1], [2], and [3], and 
the accompanying bibliographies. 

Any problem of the above type, concerned with estimating or testing hypothe- 
ses on a variance component, can always be reduced to a corresponding problem 
concerning the difference between unknown variances of two normal distribu- 
tions with known zero means. In the following section the latter formulation will 


be used, later converting the results into the usual terms of variance component 
analysis. 


3. Exact confidence limits for the difference between two variances. Sup- 
pose there are two independent samples (2, 22, --- ,2n) and (¥, Y2,°** » Ym) 
from N(0, oi) and N(0, oj + 3) respectively. It is required to construct exact 
two-sided confidence limits for «2 , having confidence coefficient 1 — a. (Corre- 
sponding one-sided limits are easily obtained from the two-sided case.) Now an 
equivalent problem is that of constructing a similar, size-a test of 


H:02=8 
A: # a; 


that is, a test satisfying Pr{reject H when o: = 8°} = a for all oj . Such a test 
will yield exact confidence limits for 3 . 
It is, however, easy to construct such a test by the adiunction of an inde- 





468 W. C. HEALY, JR. 
pendent sample (2, , z,°*:*, Zn) from N(0, 1); the statistic defined by 


Zz vi ‘m 
1 


n 


do (x + 5z)*/n 


has, when H is true, the F distribution with (m, n) degrees of freedom and ac- 
cordingly provides similar tests of H. For example, the ‘‘equal tail’ acceptance 
region for H is defined by F; S w S F:, where F; and F, are respectively the 
100 (4a) and 100 (1 — 4a) per cent points of the F distribution with (m, n) 
degrees of freedom. Corresponding one-sided tests can be described in the obvious 
way. We shall agree to define w for 6 = 0 only, corresponding to the positive root 
of o3 ; the sign of 6 does not affect the validity of the significance test, but a 
consistent sign for 6 is necessary in deriving the confidence limits. 

To obtain exact confidence limits for o3 starting with the equal-tail acceptance 
region based on w, we have 


ifi:suws 
1 ) 


s—? 


F,) 


Ss Xo (x + &) st ee¥\ 


(2) =| - f-S se sernt+ Fk Des tay - Zz. 


From (2) it is seen that an undesirable feature has crept in; if we proceed in the 
obvious way from (2), the resulting limits will involve terms in > >2z. Such 
terms would prove inconvenient in applications to variance component analysis, 
since independent quantities x, , x2, --- , x, having distribution N(0, oi) are 
not observed there, and would become available only after suitable transforma- 
tion of the original data. It would be better to have confidence limits which 
depend on the original data only through the mean squares (e.g., U and V) 
usually computed in the analysis of variance. The argument which follows 
serves to construct such computationally more convenient limits. 
Divide the inequalities (2) by 26( >> 2’)', obtaining 


chs i y+ se Z\l< dL xz se 
230, a’)? Eee X | ~ (> 2’)! 2 2(>° 2’)! 


Sao St - 2h}. 


~~ 27 , 627 
(2)? 2 2)’ 





VARIANCE COMPONENT CONFIDENCE LIMITS 469 


Now for fixed (2,2%,-*:,2,), an orthogonal transformation from 
(21, 22, °**, 2n) to (21, 22, °°*, 2n) with 21 = (D0 2z)/ (D 2’)! yields 


, , 72 
(4) (=item Pe 
where (z; , 22, -** , Zn) areeach distributed N(0, 1), are mutually independent 
are are independent of (x; , 22, °°: , 2n). The statistics ¢ and ¢/ have the same 
distribution and since ¢’ is computable directly from > az’ and (21, 22, °°", Zn) 
it will serve as the desired replacement for ¢. From this discussion, a third possible 
substitute for ¢ is seen to be 


5 6 2 
ale. * are 
where z is N(0, 1), x-1 isa chi-square variate with n — 1 degrees of freedom, 
and z and x%-_; are independent. 
Replacing ¢ by ¢’ in (3), and dropping primes, we have 


(rap are ~ 22] $4 + apt ee 
, ae a’) es ~2 *}} 


which yields, upon completing the square in 4, 


Pad = 2e | + GS a) i (s 4 a 2" 
: s 2 [3H -Z2 | + + (egy ‘y. 


An exact confidence region for a2 is then defined as the set of values 6 satisfying 
both (5) and 6 2 0. These are the values of o2 for which the null hypothesis 
= § would be accepted, and since the acceptance region (1) has probability 
identically 1 — a when o2 = 4, the confidence region defined by (5) and 6 2 0 
has confidence coefficient exactly 1 — a. 
It remains to determine the limits defining this region, and the discussion will 
be simplified if we let 


4 





470 W. C. HEALY, JR. 


In these terms the inequalities (5) become 
k+Us (6-56) 5148, 
which constrain 6 to satisfy either 
(6) b— (1+ 0)' <8 S b — {max(0, k + 
or else 
(7) b + {max[0,k + Uy}i<s 6s b+ {1+ dy. 


(We have assumed | + b’ = 0; the confidence region is empty otherwise.) Con- 
sideration of (6) and (7), together with the requirement 6 2 0, shows that the 
confidence interval will be defined by (7) alone, unless —b’ S k < Oandb > 0 
both obtain. In the latter case the confidence region consists of both (7) and 
the non-overlapping interval 


(8) max(b — {l + b7}4,0) < 6 S b — {max(0,k + B}}!. 


There thus exists a possibility that the exact confidence limits may define a 
region consisting of two separate intervals, which would in practice be a dis- 
concerting event, (though of course the “‘gap’’ could be included in the confidence 
region at the sacrifice of the exactness property.) It is of interest to examine the 
chances of getting two intervals, and we note first that b’ converges to zero in 
probability so that asymptotically the chance of two intervals is negligible. 
Also, Prib > 0} = 4. Note too that k < 0 is the condition that the variance 
ratio (>> y*/m)/(>- 2’/n) be not significant at the }a level for testing 02 = 0. 

Some simple computations can give a rough bound on the probability that 
two intervals will result. Using the definition for k, the condition —b’ < k < 0 
is readily found equivalent to the condition. 


(6 sNGd «ren (8 
> oi + 0 ai + 03 

where F has the F-distribution with (m, n) degrees of freedom. The probability 
of this event will be greatest when F.[(oi + 8°)/(ci + 3)] is close to the mode 
(mn — 2n)/(mn + 2m) of this F distribution, or roughly when F, ~ (01 + o2)/ 
(oj + 8). Thus we can get a rough upper bound to the probability of two in- 
tervals by computing the probability in an interval of length zi/F. >. 2 at the 
mode of this F distribution. In the neighborhood of the mode, a normal approxi- 
mation should suffice for our present purposes. Taking F to be approximately 
normal N[l, 2(1 + a)/(na)|, where a = m/n, the desired probability bound 
an then be approximated by 





= zi =e 1 ) pe 1 ( a =) 
E4F, >> 2 (2) = + Ny (Qen)'F, \2(1 + a)/° 


a 





VARIANCE COMPONENT CONFIDENCE LIMITS 471 


Numerically, for m = 5, n = 24 as an illustration, we obtain ~.01 as an ap- 
proximate maximum to the probability of —b’ < k < 0. Since for two intervals 
to result, we must also have b > 0, we can say that roughly speaking the prob- 
ability of two intervals will not exceed ~.005 for these values of m and n, no 
matter what the configuration of oj and 03. We conclude from this type of in- 
vestigation that the possibility of a confidence region consisting of two intervals 
by the exact method of this paper is remote for practical values of m and n, 
and hence should not cause difficulty in practice. 


4. Exact confidence limits for a variance component. It remains now to con- 
vert the result of Section 3 to the variance component problem of Section 2. The 
complete procedure can be stated as follows. Let nV/o* and mU/(o* + rej) have 
independent chi-square distributions with n and m degrees of freedom respec- 
tively, V and U being observed mean squares in a variance component 
analysis and r being a known constant. Let (2 , z2, +--+ , 2n) be a sample of size 
n from N(0, 1), which can be obtained from tables of random normal deviates, 
eg. [10]. Then exact two-sided 100(1 — a)% confidence limits for 0 are given 
(for the usual case of a single interval) by 


ace (2 —v) + (Mey T - acer 
(9) Lower limit: . {{ max 0, Dr, (F ) + ( > 2 f 


T li 1 f n U ad erry] co a(nV)*\? 
(10) Upper limit: Ls" (2 v) + ( re -> ; f 


where F,, and F, are respectively the 100 ($a) and 100 (1 — 4a) per cent points 
of the F distribution with (m, n) degrees of freedom. Furthermore, if 


(ser?) n ¢ ) 
<éiveae~) 2 —-—V})<0O and z <0, 
De Le Ps 


the interval having 


Bei oll ef wei <r) | si ar 

(11) Lower limit: = {max 0, ls: 3 (Z v) + ( ar, re? 
eg, ee ote. afer] os ery 

(12) Upper limit: { | max 0, r? (2 v) + ( r, ar; 


is to be included as well. 

An equivalent alternate procedure is to observe 2, and x%-, and replace 
>I 2 by (zi + xi-1) in the preceding limits. However, tables of the chi-square 
distribution are not sufficiently complete to permit simulation of sampling 
from a chi-square distribution and there are no tables of random x’ variates. 
While one could employ tables of random numbers and a table of the Incomplete 
Gamma Function, it may be that the best way to obtain a x%,_, variate is as the 


sum of the squares of n — 1 N(0, 1) variates, which brings us back to the first 
procedure. 





472 W. C. HEALY, JR. 


5. Numerical illustrations and comparisons. We shall take as an illustration 
the example based on the analysis of variance table on page 323 of [2], from which 
the authors give, on page 324, 90% confidence limits for a variance component 
by several approximate methods. Three of these approximate methods may be 
described briefly as 

(i) normal approximation to the distribution of (1/r)[U — V] = 6 
(ii) x’ approximation to the distribution of 6 

(iii) replacement of o* by V in exact confidence limits for 03/0’. 

The pertinent data are 


U = 46,659 
V = 459 
r = 300 


Based on sampled values z, = 0.628, >-f 2° = 62.72, exact confidence limits 
for o) are computed from (9) and (10) as 


Lower limit: 62 
Upper limit: 1514. 


Application of the three approximate methods gave the results in the following 
table, from page 324 of [2]. 


90% Confidence Limits 
Method Lower Limit Upper Limit 

(i) 0 316 

(ii) 59 1313 

(iii) 55 1331 
The extent of variations to be expected between conventional approximate 
methods and the present method may be studied by constructing an approxima- 
tion based on the present method. A simple approximation can be obtained from 
(9) and (10) by replacing functions of (2, 22, --+ 2.) by their exact or ap- 

proximate expected values. Using 


r\t 
2 (HOY) = 0,202) =m 





EK (ary EnVzi ie 


ye ) ~ ECAP 2? 


we have the slightly new approximate limits 


r 


(13) Lower limit: le —V ( 





VARIANCE COMPONENT CONFIDENCE LIMITS 


(14) Lower limit: Fe —V (2+ s) |. 


Replacing (n + 1)/(n + 2) in the above limits by unity yields the approximate 
limits from method (iii) above. It is clear that the ratio between (14) and (10) 
is essentially the ratio between n and ye z’, or x,/n. That is, the exact limits will 
tend to deviate from this conventional approximation (iii) proportionally to 
variations in x;,/n, being more variable than the approximate limits because of 
this extra element of random variation. The variance of x’/n being 2/n, one 
obtains an idea of the size of discrepancies to be encountered between exact and 
approximate limits. 


6. Power comparisons. As remarked earlier, the power of the similar test from 
which exact confidence limits have been derived herein can be computed directly 
from the F distribution. The one-sided tests are clearly unbiased. However, no 
investigation of the standing of these tests (one-sided or two-sided) in the class 
of all similar tests has been attempted. This would seem to be a difficult problem 
due to difficulties in characterizing similar tests of the hypothesis o: = &. (The 
similar test herein derived does not have Neyman structure, which means that 
(>o2’ , dy’) is not boundedly complete, which means that the methods depend- 
ing on boundedly complete sufficient statistics do not apply.) 

One might suspect that the element of randomization introduced to achieve 
similarity could result in a serious impairment of power. For this reason, com- 
parisons of asymptotic power functions have been made among the similar test 
of Section 3 and the tests corresponding to approximate confidence intervals 
(ii) and (iii) of the preceding section. We consider only one-sided tests 


H:0,=8 

A:o >® : 
we do find that the exact test is somewhat inferior in large-sample power, but 
that the amount of power impairment is not likely to be serious. 


Corresponding to the approximate confidence interval (ii) based on a chi- 
square distribution, the one-sided rejection region is defined by 


1 (Sat _ Ee) ay, 


where x7 is the 100 (1 — a)% point of the chi-square distribution with f degrees 
of freedom, and f is determined here by 


Bey 


m n 


“Tey ey 


In these comparisons of large-sample power, we will consider that m and n ap- 





474 W. C. HEALY, JR. 


proach infinity in a fixed ratio, say m = na. Then f = ng, where 
(Ev _ Ey 
a ———— = ee 
ss m 
(42) 4 (=) 
m n 
To compute the asymptotic power of this test we first note that the quantity 
n'x3/f = x’.,/gn' can be approximated by 
xno/gn? ~ [2gn'\"[ta + (2ng — 1)*f = n' + 2*(t./g') + “terms”, 
where the “terms” approach zero in probability and ¢. is the 100 (1 — a) % 
point of N(0, 1). Also, the quantity (n'/s*)[(>-y?/m) — (><2?/n) — o3] hasasymp- 
totically a normal distribution with mean zero, variance 28 “[{(¢} + 03)°/a} + oi]. 
Accordingly the asymptotic power of this test can be written 


j 2 2 2 4 
Pr \n | at _ 2a ~ “| + Fant |? | _ [(oi + 03)? + aot} 
2 


which yields, after some reduction and application of standard theorems, 


oa y a1 er en se = 
- 0(5 [; TG oo | ay 


where @ is the standard normal distribution function and 
= n'[(o3 - &°)/(o4 + o2)) 
measures the divergence from the null hypothesis. 


For the one-sided test based on the approximate confidence interval from 
method (iii) of the preceding section, the rejection region is 


day/m_ > FP. 
Lein+e- | 
Here the quantity 


ni! (ae oe i +3) 
~rint+F +e 
has asymptotically a normal distribution with mean zero, variance 
2 (oi + 03)" [2 none | 
(+e) la” +e’ 
and the asymptotic power function can accordingly be written 


Pr{n! ( > y7/m ms i +) —) (s + zi) 


—2/n+8 (i+ %) +e 


Using now the fact that n'(F — 1) is asymptotically N(0, 2[(1/a) + 1]), we 
obtain for the power function, after some reduction, 





VARIANCE COMPONENT CONFIDENCE LIMITS 


: p _ (oi +8) (: y] [} ai y) 
16 SES. + speenee Gol = & t i eeeertaroeen Ee 
a ([5 ito) “\at te (oi + &)? 
It should be noted that the test statistic (>°y?/m)/ (D-2"/n + 8) and F do 
not have the same limiting distribution even on the null hypothesis, which 
incidentally accounts for this more complicated form of the asymptotic power 
function for approximate method (iii). 


To cast the asymptotic power function of the one-sided similar test into analo- 
gous form we use the limiting distribution of F. The power function is 


(oi + 3 
Pr{F =F aye) 


which asymptotically becomes 


p jl rm (oi + &) 
o{6 E * | ~ Se He 


Comparison of the power expressions (15), (16), and (17) shows that the 
approximate chi-square test based on method (ii) is asymptotically most power- 
ful among these three tests. Comparison of the F approximation method (iii) 
with the similar test shows that for 6 > 0, the method (iii) will have superior 
power for large values of p while the similar test will have superior power for 
small values of p. 

It is of interest to evaluate the magnitudes of these power differences. Consider 
a comparison of (15) and (17), which compares the asymptotic power of the 
similar test with that of the best test among these three. We first note that 


7) 


n — © and the definition of p imply that for 6 > 0 both (oi + 8°)/(oi + 03) 
and 4°/o2 will be close to unity, so that the difference in power resulting from the 
difference in multipliers of ¢, can be neglected. One way to compare tests is on 
the basis of sample sizes required for equivalent power, and for equivalent power 
we see from (15) and (17) that sample sizes must be approximately in the ratio 


r=(1+1)+(¢+5%5) 
ze a} \a © (04 + 03)? 


for these two tests. Now R must satisfy 
(18) 1sRs1+4a, 


and a is ordinarily a fairly small number, corresponding to the fact that a “‘be- 
tween” variance is usually estimated on much fewer degrees of freedom than is a 
“within” variance. For example, with a one-way classification having r observa- 
tions per class, 1/a ~ r. The inequality (18) thus means that with a = #5 as 
found in the first numerical example of Section 5, the exact test requires less 
than 4% more observations for equivalent asymptotic power. 

These results seem to say, without further numerical investigation, that for 
most designs yielding fairly small values of a, our use of randomization to achieve 





476 W. C. HEALY, JR. 


exact similarity in small samples does not cost much in terms of large-sample 
power. Numerica] power comparisons would be interesting but have not been 
computed. 
REFERENCES 
[1] CuurcaiLL Ersennart, ‘The assumptions underlying the analysis of variance,’’ 
Biometrics, Vol. 3 (1947), pp. 1-21. 
[2] R. L. ANDERSON AND T. A. Bancrort, Statistical Theory in Research, McGraw Hill, 
New York, 1952. 
[3] S. L. Crump, ‘‘The present status of variance component analysis,’’ Biometrics, Vol. 7 
(1951), pp. 1-16. 
[4| J. R. Green, ‘“‘A confidence interval for variance components,’’ Ann. Math. Stat., 
Vol. 25 (1954), pp. 671-686. 
[5] A. Hurrson, “‘A method of assigning confidence limits to linear combinations of 
variances,’ Biometrika, Vol. 42 (1955), pp. 471-479. 
(6] B. L. Wewcn, ‘‘On linear combinations of several variances,’ J. Amer. Stat. Ass’n., 
Vol. 51 (1956), pp. 132-148. 
[7] W. G. Cocuran, “Testing a linear relation among variances,’’ Biometrics, Vol. 7 
(1951), pp. 17-32. 
(8] A. BrrnBaum AND Wiuu1aM C. Hea y, Jr., “Estimates with prescribed variance based 
on two-stage sampling,’’ Ann. Math. Stat., Vol. 31 (1960), pp. 662-676. 
[9] The RAND Corporation, One Million Random Digits and 100,000 Normal Deviates, 
The Free Press, Glencoe, IIl., 1955. 
(10) H. Cramér, Mathematical Methods of Stctistics, Princeton University Press, Princeton, 
1946. 





CONFIDENCE SETS FOR MULTIVARIATE MEDIANS! 
By P. G. Hort anp E. M. Scuever? 
University of California, Los Angeles 


0. Summary. This paper considers the problem of finding confidence sets of 
the parallelepiped type based on extreme order statistics for multivariate medians 
when no parametric assumptions are made. A partial characterization of a multi- 
variate distribution which will minimize the probability of the specified paral- 
lelepiped covering the multivariate median is given. This characterization 
enables one to obtain a sharp lower bound for the probability of coverage, pro- 
vided the number of medians does not exceed seven and under the assumption 
that the structure is independent of the sample size. 


1. Introduction. There exists a considerable amount of literature on the prob- 
lem of estimating means of multivariate distributions by means of confidence 
sets. Most of it is concerned with parametric models as such, or with parametric 
models that arise from asymptotic considerations. Furthermore, the confidence 
sets are often ellipsoids, but these are not the most useful kind for applications. 
Parallelepipeds are considerably more useful in most applications. Since medians 
are the natural substitutes for means in nonparametric problems, the problem 
considered here is that of finding confidence parallelepipeds for multivariate 
medians. 


2. Formulation. Let (z,,---, x,) be a random variable having the unique 
median (,-°-:, ve). Let (m;,°-*, 24),j = 1,°-+,n, denote the random 
variables corresponding to a random sample of size n. The ordered values of a 
sample will be denoted by 


z(1) S 2(2) S --- S 2,(n), 


Let the set R be defined by 
R = {(m,°--,2) |z(1) S 2; S 2(n),t =1,---, d. 
The problem then is to find a sharp lower bound for 
© = P{(m,---, vv) € R}. 


The resulting value of @ will be the confidence coefficient that can be guaranteed 
for the confidence parallelepiped formed by the planes parallel to the coordinate 
planes which pass through the extreme sample points for each coordinate, re- 
gardless of the nature of the distribution of (2, --- , z:). A result by Dunn [1] 


Received May 25, 1960; revised August 10, 1960. 
! This research was supported by the Office of Naval Research. 
2 Now with Space Technology Laboratories, Inc. 


477 





478 P. G. HOEL AND E. M. SCHEUER 


for t = 2 shows that this lower bound is attained when the two variables are 
independent; however, this property does not hold for higher dimensions. 

The method that will be employed here is based on Bonferroni inequalities, 
and shows that a distribution that minimizes ® must possess a certain structure. 
In the derivation of these inequalities, the following notation will be needed. 

Let E; be the event that z,(1) > »; ifi = 1,---, ¢, and the event that 
Zin) < wifi = t + 1,---, 2t. Then, from the definition of R and @, it 
follows that 


1-0 =Pl(ny--,n) eR) = PLU ES. 


The last expression can be written [2], p. 89, in the form 


2¢ 
(1) P{U EB) = 8-84 ++ — Su, 
1 
where 
2¢t 
S, = 2d PIE, S. = 2d, PEE}, ves 
= <7) 


Because of the nature of the F; , it follows that Si,; = --- = Se: = 0 here. 

Probabilities such as P{#,£;} depend only upon the probability mass assigned 
to each of the orthants determined by a set of coordinate axes through the median 
point (1, -°--, vz). In this connection, let 


Gij-m = PIE; +++ Ea} 


for n = 1. This quantity is defined only if all subscripts differ and provided 
that no two subscripts are equal mod ¢. The latter restriction is necessary because 
E; and E;,:, where the sum 7 + ¢ is taken mod 2t, are incompatible events. 
Thus, q;;...m yields the probability mass for the region determined by the proper 
positive or negative coordinates for the variables x; — v; , 2; — v3, °°* ,2m— Vm- 
If a subscript exceeds ¢t, then the corresponding coordinate is negative, otherwise 
it is positive. 


3. The case of t = 3. The method of obtaining the desired inequalities for @ 
is considerably simpler and neater when t = 3; therefore this case will be con- 
sidered first. In view of the definition of £; and q;j...m , it follows that 


PIE} = qi = (3)", 
P{EE} = qu, i,j=1,---,6 
P{EE;E,} = Vin , i,j,k =1,--- ,6. 
This notation applied to (1) will yield the expression 
(2) @®=1-S+&—-— 8 =1- 64)" + Dai — Dd ain. 
i<j i<j<k 


Now it follows from the definition of q;;...m that qij...1m + Qij---tm4) = Qéj---t- 





MULTIVARIATE MEDIAN CONFIDENCE SETS 479 


Using this property and the fact that the q’s are nonnegative, one can obtain the 
following inequality for S; . In this derivation, sums are over all possible permuta- 
tions of indices for which the q’s are defined, unless specified otherwise. 


S=%t Digin S32 lain + disasn)” = 3 Digs = 4 Xa = $&:. 


In view of the convexity of v” for v = 0, it follows that 
of +02 22 (at *) 
2 
This inequality together with the fact that g; = 34, may be used to derive the 
following inequality for S, . 
S= 4 Dai = 4 D lais t+ ahi) 2 4 D Abad” = 4 D4)” = 12(8)*. 
I= IS 


If these two inequalities are applied to (2), one will obtain the inequality 
(3) @ = 1 — 6(3)" + 8(4)". 
Consider a probability distribution with 
us = i, qs = 0, us = 0, dus = 
dus = i, dass = O, qu = 0, qus = 
These values satisfy the restrictions g; = },7 = 1, --- , 6. Further, it is easily 


seen that they yield the value 8(})” for the sums on the right side of (2); hence 


the lower bound given by (3) can be attained. This completes the proof of the 
following theorem. 


THEOREM 1. P{z,(1) < »; < a(n), ¢ = 1, 2,3} = 1 — 6(4)" + 8(4)”, and 
this lower bound is sharp. 


4. The general case. The method used in the preceding section does not seem 
to generalize to higher dimensions; therefore a different method of attack is 
introduced. It will now be necessary to assume that ¢ < 8. 


Let gq = max;,; qg;; and suppose there exists some q;;, possessing the value g. 
Then, exploiting relations of the type 
Qi = Qik + Qijaty and gi = Qj + Qui+y = 2, 


one can easily prove the following lemma, where, as before, sums such as 
k + t are taken mod 2. ; 


Lemma 1. Let q = max;,; qi; and suppose dare = Q, then 
ii = UWitnG+o = Y i,j=a,be, i<j 
rr : ; 
dius+o = MWi+noi = 7—- Y 1,7] a, b, Cc, t < J 
Jab(c+t) = Ja(b+t)e = Q(a+t)be = 0 
Ja(b+ t)(c+#) Qa+Hb(e+t) = Cat+H(b+He = $—-@q 


Q(a+t)(b+1)(c+) 2q — 3. 





480 P. G. HOEL AND E. M. SCHEUER 


These are the only restrictions on the q’s with double and triple subscripts that 
result directly from the lemma assumptions. 

Now consider the problem of counting the number of double and triple sub- 
script q’s that assume the maximum value qg. The following lemma gives in- 
equalities for those two numbers. 

Lemma 2. Let there be M quantities qa». that take on the value q = max;,; qi; . Let 
there be M’ quantities qas that arise from these M quantities qa». that have the value q. 
Then 

(a) ifqg #3, M’ > M providedt < 8 

(b) ifqg = 3, M’ > M provided t < 5. 

Proor. To each gas. = g, Lemma 1 shows that there will correspond six 
qas's that have the value g; consequently M/’ will certainly exceed M unless 
different gas-’s having the value g possess a sufficient number of q.s’s in common. 
Since the objective here is to show that M’ > M, it will suffice to give a proof 
for the least favorable situation in which M is as large as possible and M’ is as 
small as possible for any fixed number, r, of distinct subscripts. 


a 
2 
1 
2 


aa r 
(a) Suppose g ~ 3. Then one can form at most (;) Gabe S With the value g 


; r . ‘tne- 
and, by Lemma 1, there will be 2 | ,, } corresponding q.s’s with the value q. But 


J r r . . 
2 (5) > (5) provided r < 8. 


(b) Suppose g = 3. Then, using the last conclusion of Lemma 1, one can form 


, 
at most 2 C 


ing qas’s With this value. But 2 (5) >2 ( 


The preceding two lemmas will be used in the proofs of the following two 
theorems. 

THEOREM 2. For 2 < t < 8, a set of orthant probabilities that minimizes @ for 
all sample sizes must be one for which all qi; = 3. 

Proor. Consider a set of orthant probabilities for which the q;; are not all 
equal in value. Let P* and P denote the values of ® for an orthant probability 
configuration for which the q;; have the common value 4, and one for which 
they do not have this common value, respectively. Then, by a Bonferroni in- 
equality [2], p. 100, 


1 n t 1 n 
2* a» & a 


p>1-2(5) +M’'q'—- Dd din, 


i<j<k 


: , : r 
) dab.’ S With the value q, while again there will be 2 (5) correspond- 


r 


) provided r < 5. 


‘ 
« 


where M’ is the number of q;; assuming the maximum value gq. Then 


+s . t\ /1\" 
P-—P*>mM'¢- > ain — 4(5) (3) 


i<j<k 





MULTIVARIATE MEDIAN CONFIDENCE SETS 


Fie. 1 


Under the assumption that the g,; are not all equal in value, it follows that 
q > i here. Now if max qi, < q, the term M’g” will dominate the right side of 
this inequality as n —> ©; consequently P > P* for sufficiently large n. This 
shows that no probability configuration in which the q;; are not equal can pos- 
sibly minimize @ for all sample sizes, provided max qi; < q. 

If max gin = q ~ 4 andt < 8, then 


t\ /i\" 
P—P*> q Iq o(g") — 4 (5) (7) 


= (M’ — M)q" — o(q") — 4 (3) (3) 


From Lemma 2, M’ — M > 0; consequently the term (M’ — M)gq” dominates 
the right side of this inequality, and therefore the same conclusicn follows. 

This same proof holds for gq = 3, provided that t < 5. To complete the proof 
of the theorem it is therefore necessary to show that ® cannot be minimized for 
all sample sizes when g = 3 and ¢ = 5, 6, or 7. 

It follows from Lemma 1 that q:+:,;4: = 3 if qj = 34. In the two dimensional 
space of the variables z; and z;, the probability distribution is therefore as 
shown in Figure 1. This distribution implies that »; will be covered by the confi- 
dence parallelepiped if and only if »; is covered. If the probability of covering 
the median point is not zero for a configuration satisfying the preceding restriction 
shown in Figure 1, then this probability can be decreased by constructing a 
configuration for which the probability that z; — v,; will assume a given sign is 
independent of the remaining x’s, without changing their probability distribution. 
This is accomplished by halving all orthant probabilities and shifting one half 
of each such probability mass to the corresponding orthant with the opposite 
sign of x; — v; . That the probability of coverage has been decreased follows from 
the fact that the conditional probability of »; being covered, given that the re- 
maining v’s are covered, will no longer be equal to one and that this shift in 
probabilities does not affect the probability of the remaining »v’s being covered. 
For sufficiently large n, the probability of covering the median point cannot be 
zero. Thus, gq = } can be excluded from consideration in characterizing dis- 
tributions that minimize ©*. The next theorem places further restrictions on any 
minimizing configuration. 

* We are indebted to John W. Pratt for suggesting this method of proof for disposing of 
q = 4. It is much shorter and neater than our original proof. 





482 P. G. HOEL AND E. M. SCHEUER 


THEOREM 3. For t < 8, a set of orthant probabilities that minimizes @ for all 
sample sizes must be one for which the maximum number of ij, assume the value }. 

Proor. If q;; = } for all 7, 7, then it is easily shown that qij.: < } for all 7, 7, k, l. 
Any q with more than four subscripts will of course also satisfy this inequality. 
From Theorem 2 it follows that for a minimizing set of probabilities, 


i" t hag = a. 
g=1-2(5) +4(5)(3) - iu +o (7) 


Let \ denote the number of q;;, having the value }. Then it is clear that 


© =1—20(4)"4+4 (5)(t) re (i) + (1), 


and that this quantity can be minimized for all n by a given distribution only 
if that distribution has maximum possible X. 


5. Numerical lower bounds. The preceding theorems, together with the follow- 
ing lemma, will suffice to yield a theorem on the magnitude of sharp lower bounds 
for @. 


Lemna 3. If qi; = 4 for all i, j and qix = 4 for some k, then 


on in oni ades aan 
Qijk(l4r) = QVilG4OK+ O04) = Wit dsk4+OC4) = Wi+0G+0kK4) = FB 


Qiik+ (ltr) = Qilit+onden = Uitnunen = Citou+oa+ou+n = O 


forl # i,j, kandr = Oandr = t. 

Proor. It suffices to consider orthant probabilities in the four dimensional 
space of the variables x; , x; , 2 , x; as shown in Figure 2. The condition qi;, = } 
implies that a; + a = }. Imposing the condition that each gag = } yields 
a, = G3 = Ag = Ag = Ay = Ay, = Ay = Aye = § and zero values for the remaining 
a’s. This suffices to prove the lemma. 

The desired lower bounds are now given by the following theorem. 


THEorEM 4. Under the assumption that there exists a set of orthant probabilities 





MULTIVARIATE MEDIAN CONFIDENCE SETS 


Fic. 3 


that minimizes @ for all sample sizes, sharp lower bounds for @ for 2 < t < 8 are 
given by the formula 


@ = 1 — 2t(3)" + 4(3t — 7)(4)" — 16(¢ — 3)(4)". 


Proor. Theorem 1 has already given these bounds for ¢t = 3; therefore consider 
t > 3. The method of proof is essentially the same for all values of ¢ satisfying 


3 < t < 8; consequently only the proof for t = 5 will be given to illustrate the 
nature of the proofs. 


Without loss of generality, suppose that qu: is a qi; that assumes the value }. 
By Lemma 3, it then follows, for example, that 


drow = Gre = G25 = G20 = $. 


These values, together with the other values given by this lemma, suffice to 
yield the orthant configuration shown in Figure 3. The lemma conclusions re- 
quire, for example, that 


(4) a + by = §, a + a5 = 3, as + bs = 8, bi + bs = 3, 
and therefore that b; = as and bs = aq. 

Since a configuration is to be chosen that has the maximum number of qi; 
possessing the value 3, suppose that gin, = 3. Then a; + b; = 3, but (4) shows 
that this choice is not possible. It follows readily that no q;;, with two indices in 
common with gw; can be chosen. Suppose next that gu; = 4. Then, by Lemma 3, 
Gis = iss7 = Gins = Qisss = %- The values of the a’s and b’s in Figure 3 are 
now completely determined and are given by a; = a4 = de = a; = h = b; = 
bs = bs = 3, and zero values for the remaining symbols. 

If instead of assuming that qus had the value }, one had chosen some other 
dik With one index in common with qgy»;, then the same configuration would 





484 P. G. HOEL AND E. M. SCHEUER 


have been obtained except for a reflection in an axis. The value of ® would, of 
course, be unaffected. If one now uses the configuration values just obtained to 
calculate the values of the respective terms in the Bonferroni expansion of @, 
he will obtain the value 


P = 1 — 10(3)" + 40(4)" — [8(4)" + 64($)" + 8(0)"] 
+ [40($)" + 40(0)"] — 8(4)". 


Consequently, collecting terms, it follows that 
@ = 1 — 10(3)” + 32(4)” — 32(%)” 


for t = 5, and that this result is sharp. 

Similar, but considerably more tedious, methods will demonstrate the cor- 
rectness of the formula given by the theorem for the larger values of t. The 
demonstration for ¢ = 4 is of course the simplest one. 

Although the formula of Theorem 4 has been demonstrated only for 2 < t < 8, 
it is conjectured that the formula holds in general and that there always exists a 
configuration of orthant probabilities that minimizes @ for all sample sizes. 


6. Other order statistics. As n increases, the confidence coefficient will in- 
crease, but so will the size of the confidence parallelepiped. A smaller size 
parallelepiped, at the expense of a smaller confidence coefficient could be ob- 
tained for a symmetric distribution, for example, by taking means of consecutive 
pairs of samples. The median for the new variables will be the same as for the 
old variables. This averaging would tend to decrease the size of the parallelepiped 


as well as the confidence coefficient. If the sample were very large one could use 
means of more than two consecutive samples. For the general situation, in order 
to obtain useful confidence regions for various sample sizes, it would be necessary 
to find corresponding inequalities for other order statistics. 
REFERENCES 
{1} Ottve Jean Duwn, ‘Estimation of the medians for dependent variables,’’ Ann. Math. 
Stat., Vol. 30 (1959), pp. 192-197. 


[2] Wiiu1aM FELLER, An Introduction to Probability Theory and its Applications, Vol. 1, 
2nd Ed., John Wiley and Sons, New York, 1957. 





DISTRIBUTION FREE, TESTS OF INDEPENDENCE BASED ON THE 
SAMPLE DISTRIBUTION FUNCTION 


By J. R. Buum,! J. Krerer,? anp M. Rosensuiatr*® 


Sandia Corporation and Indiana University; Cornell University; 
and Brown University 


0. Summary. Certain tests of independence based on the sample distribution 
function (d.f.) possess power properties superior to those of other tests of inde- 
pendence previously discussed in the literature. The characteristic functions of 
the limiting d.f.’s of a class of such test criteria are obtained, and the correspond- 
ing d.f. is tabled in the bivariate case, where the test is equivalent to one originally 
proposed by Hoeffding [4]. A discussion is included of the computational prob- 
lems which arise in the inversion of characteristic functions of this type. Tech- 


niques for computing the statistics and for approximating the tail probabilities 
are considered. 


1. Introduction. The idea of using various simple functionals of the sample df. 
of vector chance variables in order to test the independence of components, is a 
natural one. Only the difficult distribution theory prevents the use of such tests 
and the resulting achievement of improvement in power performance over all 
currently used tests. Specifically, let 2 be the class of continuous d.f.’s on m-di- 
mensional Euclidean space R”, and let w be the subclass consisting of every 
member of 2 which is a product of its associated one-dimensional marginal 
d.f.’s. Let X,,---, X, be independent random m-vectors with common un- 
known d.f. F, a member of Q, and suppose that it is desired to test the hypothesis 
Hy: F €w against the alternative Hy:F eQ — w. Let S, be the sample df. of 
Xi,-°°:, Xn ;i.e., for z in R”, S,(x) is n™ times the number of X; all of whose 
components are less than or equal to the corresponding components of 2, i.e., 


; 1 n m ; 
Sr(ry, re -**) Tm) — >I r,(X}”), 
jel i=l 
where X; = (X$”,--- , X$”) and 
aa 1. B22 t 
Ot) = 10 if z>r. 


Write S,,; for the marginal d.f. associated with the jth component of S, (i.e., 
for the sample d.f. of the jth component of the X;), and let 


(1.1) T,(r) _ Sn(r) a I] Sas(rs). 


Received April 24, 1960; revised September 15, 1960. 


1 Research sponsored by the Office of Ordnance Research, U.S. Army, under Contract 
No. DA-33-008-ORD-965. 


2 Research sponsored by the Office of Naval Research. 
3 Research sponsored by the Office of Naval Research. 


485 





486 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


Then many tests based on 7’, will have good power properties (see Section 4) 
and will be similar on w. For example, the critical region based on large values of 


A, = sup |T,(r)|, 


a statistic constructed in the spirit of the Kolmogorov-Smirnov statistics, evi- 
dently has such properties. It follows from the results of [8] that the d.f. of n'A, 
under Hp differs from unity by less than c; exp (—c22”) for all n and all arguments 
z > O, where the c; are positive constants. It can be shown that the limiting 
d.f. of n’A, exists (and hence has the same behavior with z); since the proof is 
somewhat long but uses mainly ideas like those of [8], it will not be given here. 
The calculation of this asymptotic distribution seems formidable; it is equivalent 
to the computation of the d.f. of the maximum of a particular Gaussian process 
with multidimensional time parameter. A corresponding calculation of exact 
(nonasymptotic) distributions for various values of n can, of course, be achieved 
numerically, but such calculations are extremely laborious even if done by 
machines for rather small n. 

Another critical region, constructed in the spirit of the von Mises-Cramér 
tests, is that based on large values of 


(1.2) B, = [rr dS,,(r). 


Adapting the well known technique of Kac and Siegert [5] to the present setting 
(such a multidimensional computation was first carried out in [12]), we shall 
obtain the characteristic function of the asymptotic distribution of nB, under 
Ho when m = 2 (Section 2), in which case the test turns out to be equivalent 
to one constructed on other heuristic grounds by Hoeffding [4] (see Section 5 
below for the form in which Hoeffding stated his test). Certain variants of nB,, 
in the case m > 2 will be considered in Section 3. 

In Section 4 questions of distribution under H, , power, and estimation, and 
certain modifications, will be taken up. A particularly simple and computation- 
ally convenient form of the tests is given in Section 5. In Section 6 an approxi- 
mation is suggested to the tail of the limiting distribution, which is compared 
with the exact results; this idea clearly has useful applications in many other 
problems. Methods for computing distributions of weighted sums of chi-square 
variables, which are relevant for computing the asymptotic distribution of nB, 
as well as many other important distributions in statistics, are discussed in 
Section 7. The asymptotic distribution of nB, for the case m = 2 is tabulated in 
Section 8. 


2. The case m = 2. The statistic B, is clearly distribution-free for F in w. 
As usual, we can therefore carry out our computations when F is the uniform 
distribution on the unit square J’. Let T(z, y) be a separable Gaussian process 
depending on the “time” parameter (x, y) for (x, y) in J’, and with 
(2.1) ET(z, y) = 0, 


ET (x, y)T(u, v) = [min (z, vu) — xul [min (y, v) — yo). 





DISTRIBUTION FREE TESTS OF INDEPENDENCE 


A routine computation (most easily accomplished by writing 


Sni(r)Sno(y) = tSn2(y) + ySa(z) — zy + O,(n™") 


shows that (2.1) gives the mean and the asymptotic covariance of the random 
function n'T’, . It follows from the appropriate analogue in the present case of 
the corrected argument of [12] or of the argument of Section 2 of [7] (the proof 


being very similar here) that the asymptotic distribution of nB, is the same as 
that of 


1 1 
B= [ [ T’(x, y) dxdy. 
0 0 


Writing s = (2, y), t = (u,v), and K(s, t) for the last member of (2.1), we 
consider the integral equation 


i, K(s, t)o(t) dt = r90(1). 


It is easily seen that the eigenvalues and (complete set of) eigenfunctions of 
(2.2) are 1/n'7 kh and 2 (sin zjz) (sin rky);7, k = 1, 2, --- . Hence, exactly as 
in [5] and [12], we conclude that 


(2.3) Ee? = TJ (1 — Qiz/x*fk’)*. 

jk=1 
An equivalent result was first stated by Hoeffding [4], who stated two other 
different methods for obtaining (2.3). The corresponding d.f. of B is tabled in 
Section 8. 

It is obvious that, because of the factorizability of K(s, ¢) we can similarly 
obtain the characteristic function of the limiting d.f. for the case where a weight 
function of the form W(S,i(r))W(S,2(r)) is inserted in the integrand in the 
expression for B, ; one has merely to use the corresponding one-dimensional 
results on weighted w’ statistics (see, e.g., [1], [5]) to obtain the eigenvalues. 


3. The case m > 2. For the sake of brevity we shall discuss in detail only the 
case m = 3; the corresponding results for other cases require only obvious 
changes. 

Suppose, then, that F is the uniform distribution on the unit cube. Another 
routine computation (most easily accomplished in a manner analogous to that 
suggested in Section 2) yields 


lim nET, (2, y, z)T»(y, v, w) = min (x, u) min (y, v) min (z, w) 
as oo 
— yzvw min (xz, u) — xzuw min (y, v) — ryw min (z, w) + 2ryzuvw. 
This kernel does not permit the simple treatment which that of (2.1) did, and 


the eigenvalues are at present unknown. This suggests that we look for a function 
/ . 
T, of S, for which 


lim nET . (2, y, z)T.(u, v, w) 
a?" 


= [min (z, u) — ru] [min (y, v) — yo] [min (z, w) — zu). 





488 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


Denoting by S,j the 2-dimensional marginal df. of S, corresponding to the 
jth and kth coordinates (sample d.f. of the jth and kth components of the X;), 
we easily verify that the function T,, defined by 


T.(2, Y; z) ” S,(2, Y; z) we Sni() Snos(y, z) 5 Sno(y) Snis(z, z) 
— Sns3(z) Snso(2, y) + 2Sni(x) Sn2(y) Sns(z) 


does in fact satisfy (3.2). It follows, in the manner of Section 2, that if 


(3.3) 


B. = [ (7. (r)F as,(r), 

then for F in w we have 

lim Ee?" = J] (1 — 2iz/wfi 93 93)"*. 

n>2 J1J2,J3g=1 
Thus, the asymptotic distribution of nB., can be tabulated in the manner of the 
tabulation of Section 8. However, a test for independence based only on the 
statistic B’, is not to be recommended, since the power of any such test will be 
small for many alternatives which are far from w; for example, it is clear that 
ET..(r) = 0 if F is of the form F(z, y, z) = Fi(x)F23(y, z). A solution to this 
difficulty can be found in the fact that, if the components of the X; are pairwise 
independent, then ET’,(r) = 0 for all r if and only if F ew. Thus, the three 
2-dimensional sample d.f.’s of the components of the X; can be used to detect 
departure from pairwise independence, while B., detects other possible de- 
partures from independence. There are obviously many ways in which these two 
effects can be combined in constructing a test, and only one of them will be made 
explicit here. Let Tnix(p, 7) = Snix(p, @) — Snj(p)Sne(q), and let 


Baik — [ rx? dSnjx (7). 


A computation of covariances readily shows that the functions wT, Ten, 
n'T 23, and n'T,, are asymptotically independent. Thus, arguing in the same 
manner as before, we conclude that the statistic C, , defined by 


(3.4) C, = n(Bar + Bus + Bass + dB), 


where b is a positive constant, has the asymptotic distribution with character- 
istic function 


(3.5) lim He“ = J] (1 — 2ie/a'fh’)° TT (1 — Qbie/w'fi ja js). 


n~o J1+32033 


The corresponding asymptotic distribution can be tabulated in the manner of 
Section 8. The power properties of a critical region consisting of large values of 
C,, can be obtained as in Section 4. 


4. Asymptotic distribution under H, ; power; estimation; modifications. We 





DISTRIBUTION FREE TESTS OF INDEPENDENCE 489 


consider the case m = 2 throughout this section; the analogous results obviously 
all hold when m > 2. 


If F(x, y) is not of the form G(a)H(y), where G and H are the two continuous 
marginal d.f.’s of X,, the limiting df. of n'B, can be obtained by noting that 
n'B,, is asymptotically 


n' [[ UeU (a, wP alSu(x, y) — Fx, yD) 


+ n' | [U, (x, y) — EU,(2, y)] dF(ax, y) + 0,(1), 
where 
U(x, y) = Sila, y) — G(x)Snly) — H(y)Sulz) + G(x) H(y). 
Writing 
A(z, y) = F(z, y) — G(x)H(y), 
ds.9) = I [b.(z) — G(u)Ilbe(y) — H(v)]A(u, v) dF(u, v), 


we obtain that n[B, — ff A°(2, y) dF (2, y)] is asymptotically normal with mean 
0 and the same variance as the random variable A°(X, Y) + 2(X, Y), where 
(X, Y) is distributed according to F. An equivalent form of this result was given 
by Hoeffding [4]. 

Of greater interest for most applications is the limiting d.f. of nB, when we 
consider a sequence F” of alternatives on J* for which 


nF (2, y) =o Gg” (z)H™ (y)] aa q(2, y) 


(finite and continuous) as n — ©. We obtain, using arguments similar to those 
of Section 2, that the limiting d.f. of nB, is the same as the df. of 


BY = | (70, y) + a(z, wP ardy. 


Recalling the eigenvalues and eigenfunctions of K obtained in Section 2, we 
can write 


} 


T(z,y) = >> 26 °7"k™ (sin xzj) (sin ky) Xn, 
j kom 


where the X, are independent normal variates with means 0 and variances 1. 
Hence, writing 


dk = / 2q(2, y) (sin xjx)(sin rky) dady, 


we obtain for the limiting characteristic function of nB, , 





490 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


> Bit 2it -_ 
Be = {IT (1 - Zp) 


953 \7 
‘exp - 2 O Gi apk + 3 2d qie® vk (1 Bo 2.) \ . 
For simple q(z, y)’s (e.g., where all but a finite number of the qj, are zero), 
one could easily compute tables of the power, in the manner of Section 8. Even 
for general q(x, y), an argument like that of Section 6 would yield information. 
Without obtaining such quantitative results, we can easily give a lower bound 
on the power. The power properties of tests based on the sample d.f. have been 
discussed in detail in [6] and [7], and it will suffice to state briefly the analogous 
results for the problems treated in the present paper. Such results will clearly 
apply for arbitrary m, and for the sake of clarity and brevity we shall only state 
them for the case m = 2, the extensions to m > 2 being obvious. 
Let F be a df. on R’ and let F, and F, be the corresponding marginal d_f.’s. 
Write 
by = sup |F(2, y) — Fi(x)F2(y)| 
zy 
and 


4 


Y= if. [F(2, y) — Fi(x)F2(y)) dP y(2) dFs(y)} 


(A similar treatment applies if the integrating measure is replaced by F in the 
definition of yr .) Then, for 0 < a, 8 < 1, there is a constant C(a, 8) such that, 
for each d > 0, there is a critical region based on large values of A, with 
n < C(a, 8) d” and which has size Sa on w and power 2 for all alternatives 
F for which 5 2 d. Thus, the behavior of the required sample size as a function 
of d is of the same order as in common parametric (e.g., Gaussian) examples. 
The same conclusion for B, holds if 6 is replaced by yr in the above. 

It is clear that this guaranteed behavior of the power function against all 
alternatives is far superior to that of the other nonparametric tests previously 
described in the literature (outside of [4]). Many of the latter have zero efficiency 
compared with tests based on A, or B, . Perhaps the best of these classical tests 
is the chi-square test with the observations divided into the k;, classes determined 
by kn — 1 equally spaced values of S,:(7) and S,2(y). The optimum choice of 
k, has not been investigated, but it is reasonable to suppose that the power 
function for the optimum choice will behave no better, and possibly worse, than 
that of the best chi-square test of goodness of fit (see [10], [6]). If this is so, we 
would conclude that, if N observations are required by the test based on 
A, (resp., B,) to achieve a goal in terms of 5 (resp., yr) like that described in the 
previous paragraph, then at least C(a, 8)N°* observations are required by the 
best chi-square test. 

We remark that the relationship between 6, and yr is easily seen to be 
dy = vr = Cdy, where C > 0. 





DISTRIBUTION FREE TESTS OF INDEPENDENCE 491 


In many applications it is desirable not merely to test for dependence, but 
rather to estimate the type of dependence. There are many possible formulations 
of this problem. If it is desired to estimate the entire function F — FF; , then, 
for almost any reasonable weight function, a modification of the arguments of 
[9] shows that S, — S,1S,2 is asymptotically a minimax estimator (asn — ©). 
Similar results hold for the problem of estimating various functionals of F, F, , Fs . 

These results on power and estimation also apply under such obvious modifi- 
cations as that of considering the probabilities and empiric frequencies in all 
rectangles instead of only in third quadrants, of inserting a weight function in the 
definition of A, and B,, etc.,Also, as in [6}, [7], [8], the results on size and 
minimum power are not materially affected if discontinuous distributions are 
admitted. We note also that, just as in [7], the results are unaffected if the 
integrating measure S, is replaced by Sy;Sn2 --- Sam in the definition of B, 
(many other functions could be used, too); in fact, the limiting df. is exactly 
the same with this modification. 


5. Computation of the statistics. The statistic B, (or one of its variants, such 
as those mentioned at the end of Section 4) is rather unwieldy for practical 
computations in its form (1.2), even if the integral is rewritten as a sum to 
take account of the atomicity of the integrating measure. The form originally 


suggested by Hoeffding for his statistic (which differs slightly from B,,) for 
n = 5 was 


1 


= 4n(n — 1)(n — 2)(n — 3)(n — 4) 


(5.1) 


4 


2 
2" TD lon (XP) — bc ( XP lon ( KP) = bn (KPI, 
= t iy 41 A 


where ¢ is defined as in Section 1 and >\” denotes the sum over all 5-tuples 
(i;, °°: , ts) of different integers, 1 < i, S n. Another form of D, , for use in 
computations, was given by Hoeffding in Section 5 of his paper. 

A more convenient form than (1.2) for computational purposes is obtained by 
noting that, when m = 2, 


n'T,(X5", X5”) = Ni(j)Na(j) — N2(j)Na(3), 


where N,(j), N2(j), Ns(7), Na(j) are the numbers of points lying, respectively, 
in the regions {(z, y) |x 2 X;, y Ss Y3, {(z, y) |x > X;, y s Y;, 
{(z,y)|2 S&S X;,y > Y3, {(z, y) |2 > X;, y > Y;}. Thus, we have only to 
count the number of points lying in each of the four regions determined by the 
vertical and horizontal lines through X; = (X$”, X$”), and compute 


(5.2) B, = n* 2 (Ni(j)Na(7) — Na(j)Na(3)F. 


Similarly, when m > 2 a statistic such as that of (3.4) can easily be written 
in terms of the numbers of points in each of the 2” orthants determined by the 





492 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


m hyperplanes through X; = (X}”, --- , X$”) and parallel to the coordinate 
hyperplanes. Thus, for m = 3 the statistic C, can be written in terms of quanti- 
ties Ni(j), --- , Ns(j). We omit the detaiis. 


6. Approximation to the tail probabilities of the limiting distribution. We again 
limit ourselves to the case m = 2, although the discussion which follows even 
has obvious applications to problems outside this paper. 

The Laplace transform of the asymptotic distribution of the test statistic nB, 
under the null hypothesis is 


(6.1) [| + eal - 


jenn wp ke 


The singularity of this expression in the complex ¢-plane which has largest 
real part is located at t = —(x‘/2). In the neighborhood of t = —(zx‘/2) the 
expression (6.1) has the same behavior as 


2t\? 17? 
Ww porta ihe Scinihai 
(6.2) (: + 2) oe 1 ap ; 


Making use of the relation [(sin z)/z] = [],2: {1 — [2*/(2°n’)]} we see that 


l Ps " a/n : 
wl, |) - se] - v?IL (atSa)- 


We have been unable to invert (6.1) directly. However, some of the 
Tauberian theorems for Laplace transforms (see e.g., [2], p. 269) suggest that if 
we invert (6.2) we should approximate the tail of our distribution reasonably 
well. Thus we are led to approximate our distribution in the tail by 


c ; ( 2 
(6.3) 2‘ TI ("7") P (= > t\, 
n=2 T ) 


sin (r/n) 


where X is a normal random variable with mean zero and unit variance. 

The tabulation which follows gives the exact value of 1 — F(y) and the cor- 
responding tail approximation, where F is the limiting distribution of 2‘nB,/2, 
tabled in Section 8: 

1 — F(y) Tail approximation 
145 115 
0414 .0361 
.0130 0118 
f .00424 .00395 
6 .00142 .00134 
7 .00048 .00046 


Thus, the agreement is quite good for even moderate values of the size. A similar 
approximate computation for the asymptotic distribution of the von Mises 
statistic tabled in [1] also gave good agreement. 





DISTRIBUTION FREE TESTS OF INDEPENDENCE 493 


7. Some remarks on computations. The computation of the d.f. of a weighted 
sum of independent chi-square variables, such as that whose characteristic 
function is given by (2.3) or by (3.5), arises too frequently to require further 
mention of examples. Unfortunately, the computational techniques now avail- 
able in the literature for such problems are often extremely poor in applications. 
While the authors have no panacea to suggest, it does seem appropriate to make 
a few remarks whose content has proved helpful in considering the computations 
of the present and other papers (e.g., [6], [7]). 

A. Useful inequalities for estimating truncation error. In inverting expressions 
like (2.3), it is usually convenient to work with a finite product, and it is there- 
fore necessary to have a bound on the error introduced by truncating the in- 
finite product. To this end, we consider the random variable 


(7.1) Z= >oaY:, 
kel 


where c; > 0 and the Y; are independent chi-square variables with one degree 
of freedom (it will be obvious that the case where Y; has n; degrees of freedom 
can be reduced to this case). We seek an upper bound on the quantity 


(7.2) p= PiZ> ¢, 


where « > 0. The usual Chebyshev inequality is not very good here, and any 
of several modifications yields great improvement. The details of one such modifi- 
cation will now be given. We have, for0 < 7 < (2 max. c)™’, 


p a P{e"” > e*"} < sie 


7.3 
=“ = exp{—eT’ — } > log (1 — 2Te,)}. 


Thus, for given c and e¢, the best bound of this type is achieved by minimizing 
the expression in braces with respect to 7’. It is easier to obtain an explicit bound 
by first invoking an inequality such as 


(7.4) — log (1 — 2Tc.) S —(e/c*) log (1 — 27 c*), 


where c* = max, c . Substituting the expression on the right side of (7.4) into 
the last expression of (7.3) and then minimizing with respect to T, and writing 
S; = Divci and « = S,(1 + 8), we obtain 


(7.5) P{Z > (1 + 8)S} < exp{ —(S;/2c*)[s — log (1 + 8)]}. 


This can be improved by using a sharper inequality in place of (7.4). For ex- 
ample, the substitution 


(7.6) —Jog(1 — 2Tc) < Tex — (ci/c*) flog (1 — 2Tc*) + Te*] 
yields, in place of (7.5), the better bound 


we r : _& _ ~ *)|\ 
(7.7) PZ > (1 + 8):} < exp{ Sa x tog (1 + S. Jj) ° 





494 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


Further improvements can be made similarly. Of course, the usual Chebyshev 
inequality is 


(7.8) P{Z > (1 + 6)S:} S 1/(1 + 8). 


As an example, suppose we want to truncate the product in (2.3) by consider- 
ing only terms for which jk < 10. To estimate the error involved in doing this, 
we seek an upper bound on p where the set {c,} consists of the A, of Section 2 
for which jk > 10. Routine computations yield e~"” and e” for the bounds of 
(7.5) and (7.7), respectively, when 6 is small. In any event, we see that ¢ in 
(7.2) must be between S, and 2S, for c of this sort, in order to make p fairly 
small. Since S,; = .0043 and since EB = .027 (where B is as in (2.3)), we can 
only conclude that an approximate computation of the d.f. of B obtained by this 
truncation, at a value x of the argument, may actually yield the true value of the 
df. at a value as far away as x + .2H#B, and this would probably be unsatis- 
factory. A larger truncation value is thus indicated. If the value 10 determining 
this truncation is increased to L, S, varies approximately inversely with L. 

Since the ratio of S, to EB is the critical factor in determining the adequacy 
of a truncation in computations like that just mentioned, and since S, often de- 
creases very slowly with increasing truncation value in such examples, a large 
number of terms in the product (2.3) will have to be used for even fair accuracy. 
An improvement would probably result from substituting for the ignored terms 
a multiple of a chi-square variable with appropriate low moments, but it seems 
difficult to guarantee an appreciable improvement in accuracy in this way. We 
shall return to these considerations in Section 7C. 

B. Some methods of expansion and inversion. One of the most commonly used 
techniques for inverting characteristic functions of the form 


k 
(7.9) I] (i — ast)” 


j=l 


where the m; are positive integers and the a; are positive, is that of Pitman and 
Robbins [11]. Although this technique and variants of it which represent the 
solution in slightly different form are sometimes useful, these methods suffer 
from three defects in many problems: (1) the solution is given in the form of an 
infinite series which converges rather slowly ; (2) the terms of the series are quanti- 
ties such as incomplete gamma functions, which may not be convenient for some 
machine computations; (3) the methods do not distinguish simple cases for 
which a simple inversion in finite terms is possible. For a trivial example of (3), 
we note that, if k = 2, m = m. = 2, a, = 1, and a@ = 2, the distribution in 
question is immediately found by a routine convolution of two exponentials to 
be 2(e* — e ”), whereas the method of [11] expresses the result as the sum of 
an infinite series of incomplete gamma functions. 

This suggests that it will often be efficient to factor out of the expression (7.9) 
the corresponding expression wherein each $m; is replaced by its integral part 
n; (say), to expand f* J] (1 — a,it)~” into partial fractions (the extra factor 
t’ being introduced so as to give the Fourier transform of the d.f. rather than 





DISTRIBUTION FREE TESTS OF INDEPENDENCE 495 


of the density ), and then to invert term by term. Thus, for example, in inverting 
the expression discussed in the paragraph following (7.8), we can factor out and 
invert such an expression, leaving only the factors corresponding to Ay: , Az , and 
Ass ; the d.f. corresponding to these terms must then be found by other means 
and can then be convolved with the df. corresponding to the other terms. It 
should also be noted that the partial fraction technique will often be easy to 
apply in cases where (7.9) is replaced by an infinite product. For example, the 
expression f []#, (1 + 2t/x’j?)~’, which is the Laplace transform of the df. 
“‘B,”’ which was computed by other means in Section 4 of [7], can easily be re- 


written as f&' + >oRy (—1)*4/2°7?(1 + 2t/x°7?), which we can invert at once 
to give, for z > 0, 


Biz) =1+2 2) (-1)'e 

- 

(Incidentally, this proves the following interesting relationship: if W, and W, 
are independent and each is distributed according to the limiting nw’, distribution, 
then +(W, + W,)*/2 is distributed according to the limiting Kolmogorov- 
Smirnov distribution. ) 

We must still discuss the inversion of general expressions like (7.9) or, with 
the aid of a factorization like that just discussed, of expressions like (7.9) with 
all m; = 1. There are many possible expansions akin to that of [11], and for 
the sake of brevity we shall illustrate only a few such possibilities in the simple 
case of (7.9) where k = 2, a, = 1, a = ¢ withO < c < 1, and m = m = 1. 
Writing ¢ for —it in (7.9) (i.e., working with the Laplace transform), this expres- 
sion becomes g(t) = (1 + t) (1 + ct). Factoring out (1 + ct)”, (1 + ct)”, 
or [1 + (1 + ’)t/2]", respectively, and then using the binomial expansion on 
the remaining factor, we obtain the three ee for q(t), 


ie, -™ c)”t? 
(@) = pak M6 GS oe 


a ie ee 
(b) at) = ae Gap 


15°) 25 
1 = Ei 
(c) g(t) = —— 2d a 
I+e l+c 
ES ee 


where c; = (2j)!/2°(j!)*. The second of these corresponds to the method of 
[11]. Thus we see that various expansions are available which differ in speed of 
convergence and difficulty of inversion. If suitable partial fraction or other 
routines are available for inverting the individual terms, an expression like 
(a) might be useful for some values of c; in other cases, (b) might be satisfactory. 
Without giving detailed calculations of examples, we can see how ill-advised it 
is always to use, mechanically, the same routine in every case. 

C. Other inversion techniques. Because of the large number of terms which must 





496 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


be kept in (2.3) in order to obtain reasonable accuracy (as discussed in Section 
7A) when applying the techniques we have discussed, and because of the other 
shortcomings of these methods (see Section 7B), it is reasonable to investigate 
other inversion techniques. For example, in the problem of Section 2, if we first 
take the product with respect to k, we obtain J]; {sinh [(2Qa)*/aj]/[(2a)*/ag}}* 
for the Laplace transform, and one can try various manipulations with this 
expression. Another possibility, which seems more fruitful in this and many other 
problems, is that of direct numerical integration to invert the expression of (2.3). 

In order to perform such an integration, one must first tabulate the function 
(2.3) for various values of the argument. A method which seems to be much more 
efficient than that of directly multiplying together an appropriately large number 


of terms of the product is to use the fact that, in a neighborhood of v = 0, we 
have 


(7.11) —i > log (1 +3) = > qv", 
ij2h VJ keen 


where a, = (—1)* ( > jzh j*)*/2k (these coefficients can be written in terms of 


Bernoulli numbers). On the basis of preliminary estimates of 


gv) = 0 T]a5 (1 + 0/7?) + 


on the proposed line of integration, the value of h can be chosen so as to make 
the series (7.11) convergent over that (finite) portion of the line where the 
integration will actually be performed. The series cah then be evaluated for 
appropriate complex v, exponentiated, and the result multiplied by the remaining 


factor of g(v), which can be expressed in terms of hyperbolic sines and of powers 
of linear functions of v. The numerical integration can then be performed. This 
was the method used to obtain the tables of Section 8. 

A recent paper by Grenander, Pollak, and Slepian [3] discusses an interesting 
computational technique for obtaining an approximation to limiting distributions 
such as those discussed above by solving a set of linear equations whose solution 
approximates that of an integral equation for the limiting df. or c.f. The reader 
is referred to [3] for details and related discussion. 


8. Tables. The inversion of (2.3) was carried out by the method outlined in the 
second paragraph of Section 7C, which was calculated to require much less 
machine time than any of the other available methods. The authors are grateful 
to Professor R. J. Walker for carying out the computations on the Cornell Com- 
puting Center’s 220. Table I gives values (under Ho) of 

F(x) = Jim P{4x'nB, S zx}, 
while Table II gives values of F~'(p). 

It is not very difficult to program a computing machine to evaluate the 
statistic B, or the modifications of it mentioned in Section 4. It may be worth- 
while, especially for small nm. to reduce the error introduced when using the 





DISTRIBUTION FREE TESTS OF INDEPENDENCE 


TABLE I 
F(y) = limys«.Pao{3e'nB, S y} 





: oad 4 
F(y) | F(y) | F(y) 


.87275 

88084 
88835 
.89534 
-90185 
.90791 
-91357 
-91885 
-92377 
-92838 
.93268 
.93670 
.94047 
-94400 
-94730 
-95039 
95329 
- 95602 
-95857 
96097 
-96322 
-96533 
-96732 
-96918 
-97094 
.97259 
-97414 
-97561 
- 97698 
-97828 
-97949 
- 98064 
-98172 
98274 
-98370 
.98461 


-00000 
.00010 
.00086 
.00389 
.01158 
.02614 
.04867 
.07899 
. 11594 
. 15784 
. 20293 
. 24960 
. 29652 
. 34267 
. 38730 
.42994 
.47027 
.50816 
.54354 
.57645 
. 60697 
.63521 
.66131 
.68540 
. 70763 
.72813 
. 74704 
. 76449 
. 78060 
. 79547 
.82193 
. 83369 
.84459 
. 85469 
. 86406 


.98546 
.98627 
-98702 
.98774 
.98841 
-.98905 
-98965 
. 99022 
.99075 
.99126 
-99174 
-99219 
-99261 
-99301 
.99339 
.99375 
.99409 
-99441 
.99471 
.99499 
-99527 
-99552 
.99576 


wSzeseSse SRERSHSRSRS 


2. 
2. 
2.3 
2. 
2.¢ 
2.% 
2. 
2. 
2. 
2.6 
2. 
2. 
2. 
2. 
28 
2.9 
2. 
3. 
3. 
3. 
3. 
3. 
3. 
3.¢ 


on 


.99755 
99918 
- 99952 
.99972 
99983 
.99990 
. 99994 
-99997 
-99998 
-99999 
1.00000 


NN om 


oo oo 








=SSee 
Szsesse 











TABLE II 


F(p) 
| 








2.286 
2.844 
3.622 
4.230 
4.851 








498 J. R. BLUM, J. KIEFER AND M. ROSENBLATT 


limiting d.f. (in particular, in the limiting covariance function) by using 
(n — 1)B, instead of nB, . 


REFERENCES 

{1} T. W. ANDERSON AND D. A. Darina, ‘‘Asymptotic theory of certain ‘goodness of fit’ 
criteria based on stochastic processes,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 
193-212. 

{2} G. Dortscu, Theorie und Anwendung der Laplace Transformation, Dover Pub., New 
York, 1943. 

[3] U. GRENANDER, H. O. Pouuak, anv D. Siepran, “The distribution of quadratic forms 
in normal variates,’ J. S. I. A. M., Vol. 7 (1959), pp. 374401. 

[4] Wasstty HoerrpinG, “A nonparametric test of independence,’’ Ann Math. Stat., 
Vol. 19 (1948), pp. 546-557. 

[5] M. Kac, “On some connections between probability theory and differential and 
integral equations,’’ Proceedings of the Second Berkely Symposium of Mathe- 
matical Statistics and Probability, University of California Press, 1951, pp. 180- 
215. 

[6] M. Kac, J. Kierer, anp J. WotrowirTz, ‘On tests of normality and other tests of 
goodness of fit based on distance methods,’’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 189-211. 

[7] J. Krerer, “‘K-sample analogues of the Kolmogorov-Smirnov and Cramér-v. Mises 
tests,’’ Ann. Math. Stat., Vol. 30 (1959), pp. 420-447. 

[8] J. Kierer ano J. Wo.rowitTz, “On the deviations of the empiric distribution function 
of vector chance variables,’ T'rans. Amer. Math. Soc., Vol. 87 (1958), pp. 173- 
186. 

(9] J. Kierer ann J. Wo.rowt17z, “‘Asymptotic minimax character of the sample distribu- 
tion function for vector chance variables,’’ Ann Math. Stat., Vol. 30 (1959), 
pp. 463-489. 

{10] H. B. MANN anp A. Wa.p, ‘On the choice of the number of class intervals in the 
application of the chi square test,’’ Ann. Math. Stat., Vol. 13 (1942), pp. 306- 
317. 

[11] E. J. G. Prrman anv HersBert Rossins, “Application of the method of mixtures of 
quadratic forms in normal variates,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 552- 
560. 

{12] M. Rosensuatt, “Limit theorems associated with variants of the von Mises statistic,” 
Ann. Math. Stat., Vol. 23 (1952), pp. 617-623. 





SOME EXACT RESULTS FOR ONE-SIDED DISTRIBUTION TESTS OF 
THE KOLMOGOROV-SMIRNOV TYPE! 


By P. WairrLe 


Statistical Laboratory, University of Cambridge 


0. Summary. I consider the calculation of the probability P, that the graph 
of a sample distribution function lie wholly to one side of a given arbitrary 
contour. A generating function approach is described in Section 2, and P, cal- 
culated exactly for some simple types of contour. Upper and lower bounds of 
the correct asymptotic form (relations (14), (15)) are obtained for P,, in the 
case of a straight line contour. 


1. Introduction. Assume, as usual, that the observations are distributed 
rectangularly in (0, 1), and so have distribution function 
F(x) =a (0S 281). 
Let the sample consist of n ordered observations 
032328 3% S14, 


and let F,(2) denote the sample distribution function 
1 
F,(z) = -. 
(2) », n 


My aim is to use the methods developed by Wald and Wo!fowitz [7] and 
Daniels [3], and previously applied by Birnbaum and Tingey [1], to obtain an 
exact calculation of the probability P, = Pr{[F,(z) S G(z);0 S z S 1 for 
certain functions G(x). 


2. General formulae. Suppose that the function G(x) is monotone non-de- 
creasing. Then we can uniquely define a non-decreasing sequence of constants 
a; by 


(1) a; = G'(j/n), (j = 1,2---n) 
and we can rewrite Pr[F,(2) S G(2z)]| as 


P, = Pr(a 2 am, %2 = a2 +++, Ln S Gp). 


Following Wald and Wolfowitz [7], let us introduce the polynomials, 
P,(z) = 


L, 
z zi ze “fe 
P(t) = [ dey [days fan, (j21) 
aj @ju1 a 


Received June 1, 1959; revised July 1, 1960. 
1 This paper was written while the author was a member of the Applied Mathematics 
Laboratory, Department of Scientific and Industrial Research, Wellington, New Zealand. 


499 





500 P. WHITTLE 


which are related to one another and to the probability P, by the equations, 
(2) Piz) = [ Pja(u) du, 


(3) P, = n! P,(1). 
Since P ;(x) is a polynomial in z of order j exactly, and 


, 


P(x) = P;-4(2), (j = 1,2, ---) 


the P;(x)’s constitute an Appell set of polynomials ({4] p. 235) and their formal 
generating function, 


(4) T(0) = > P;(x)@’, 
0 


is of the form 
(5) T(0) = A(@)e™, 


where A(@) is the formal generating function of the constant terms in the poly- 
nomials. (‘‘Formal” in the sense that we are concerned only with relations 
between the first few terms in the expansions of A(6), 7T(@), and not in the 
convergence of these expansions. ) 

A(6@) (or at least its first n + 1 terms) can be regarded as being determined 
(for a given G(x)) by the conditions that Py = 1 and that a; be the greatest 
real zero of P;(x), (j = 1, 2 ---n). (Equation (2) shows that a; is a zero of 
P(x), and it must be the greatest zero, since P;(x) is intrinsically positive for 
xL> aj ) 

It follows from equations (1), (2) and (3) that 


(6) = = coefficient of 6” in A(@)e’. 
Rather than regarding A(@) as being determined by G(x), it may be simpler 
to prescribe A(@), calculate the a,’s from the relation 

P ;(a;) = 0, (j= 1,2---n) 


and so effectively determine G(x). This we proceed to do for some simple cases 
in the next section. 


3. Some special cases. If A(6) is of the form 
A(0) = >, A,6*, 
0 


then a; will be the greatest root of the equation 


min(j,m) j-k 


a - A; a te 
As a special case, consider 


A(@) = 1 — Be” (m integral, B > 0). 





KOLMOGOROV-SMIRNOV TYPE TEST 


for which it follows from (6) and (7) that we shall have 
(8) as = [Bij — 1)(j — 2) «+» G—m+1)]}", (j =1,2---n), 
(9) P, =1— a. 


B and m are, of course, so chosen that a, S 1. In the figure we have taken co- 


ordinates x, y = G(x), and the upper heavy contour is a curve drawn through 
the points 


x Aj, 


, = G(a;) ” j/n, 


where a; is given by formula (8). The contour initially rises vertically (to the 
point corresponding to 7 — 1) after which it rises convexly to the z-axis and is 
quickly asymptotic to the straight line a; = B’"(j — $(m — 1)). Seeing that 
one expects the greatest deviations midway in the range x = (0, 1], it is reasonable 


lO 


° 
00 O45 oj 1-0 


Fia. 1. The upper and lower curves are those described by equations (8) and (10) respec- 
tively, with the following numerical values of the parameters: n = 10; m = 4, B = 0.09651; 
8 = 0.3767, y = 3.723. It will be seen that both curves are quickly convergent to a com- 
mon straight line, and that for both curves s = 1 — aio = 0.1649. The probabilities, Pi , that 
an empirical distribution curve should lie completely below the upper or the lower curve 
are 0.5136 and 0.4192 respectively, by equations (9) and (11). The Kolmogorov proba- 
bility for the region bounded by the straight line asymptote is 1 — em” = 0.4195. 





502 P. WHITTLE 


to choose a function which is concave, and it is unfortunate that the present 
calculations have led to a G(z) which is convex. However, we shall find a use for 
formulae (8), (9) in the next section. 

The scale constant B can be chosen according to several criteria. For instance, 
we should obtain a fairly symmetric contour if we so chose B that 


a, = 1 — m/(n+1). 


Varying m, we should then obtain a nested sequence of contours, beginning at 
m = 1 with a; = j/(n + 1) = E(2;). Note that formula (9) for the special 
case m = 1 is to be found in the article by Daniels [3] and is also a special case 
of Lemma | in a recent paper by Pyke [6]. 

Another elementary choice for A(6) is 


A(6) = > Re™. 


In this case a; will be the largest root of 


> Ri(a + Br)’ = 0. 


Thus, the particular case 

A(6) = (e” — e”) / (e*’ — 1) 
yields 
(10) a; = B/(e"’ — 1), 
(11) P, = [e’ — (1 + B)"|/ (e” — 1). 
Again, 8, y must be so chosen that a, < 1. Formula (10) leads to the lower 
contour of the figure; which rises rapidly and concavely from zero and is soon 
asymptotic to a straight line. By varying the constants 8 and y one can obtain 
various families of nested contours whose shape would seem to be an improve- 


ment on the straight line contour often adopted. 


4. Bounds for the significance points of the Kolmogorov test. Suppose m and 
B can be chosen in formula (8), and 6 and ¥ in formula (10), such that in both 
cases 


(12) On 

(13) (Oe;/0j) jan = 

where s is prescribed, and the derivative is defined by considering j as a con- 
tinuous variable in the expressions given for a;. On account of the respective 
convexity and concavity of the two curves, the contours (1) will respectively lie 
completely above and completely below the line y = xz + s in the square 
0 < x, y S 1. The situation is presented in the figure. That is, the two values of 


P, constitute respectively upper and lower bounds for the probability 


Pr [max [F,(z) — F(x)] S s] = 1 — Q,(s). 





KOLMOGOROV-SMIRNOV TYPE TEST 


We shall press this calculation somewhat further to obtain the following 
THEOREM: 
Q,(s) = Pr [max (F(x) — F(x)] > s] 
> (1 wail or > (1 a s)*e —2ns2—n3/(1—s) 


(15) Q (s) < go tnt +8.88nl0/(1—2)1 


(14) 


These two inequalities together are obviously sufficient to prove the known 
result that, for fixed t, 


lim Q(t/n*) = &. 


nw~w 


However, our methods are not strong enough to prove the conjecture, made in 
[2], that Q,(s) < e°"”. For comparison, note should be made of the result, 
proved in [5], that there exists a constant c, independent of s and n, such that 
Qn(s) <ce?™™. 

To prove (14), we take the a; sequence (8). If we use condition (12) to deter- 
mine B, then we find that equations (13) and (9) amount, if m is integral, to 


(16) © oh amet + -0 Ai peeeeneninnn oF ae 


n'n—1 n-m+1 ~*n(1-—s)’ 
(17) 1— P, = (1 — 8)”. 


By eliminating m from these two equations, we obtain an expression for 1 — P, 
which constitutes a lower bound for the probability Q,(s). If s is such that there 
is indeed an integral m satisfying (16) then 


[ m—1 —l 
l1—s=m by > (n— by | 
ke=Q) 


oa 


ym De) | 


j=0 


= > n*(m- rey] = 1 — (m — 1)/(2n), 


j=0 


m S 2ns + 1. 


If (16) cannot be satisfied by some integral m, then if we are to err on the con- 
servative side we must increase s until such a solution can be found. However, 
since this procedure will not increase m by as much as unity we shall have, under 
all circumstances, 


(18) m < 2ns + 2. 





504 P. WHITTLE 


Substituting (18) into (17) we obtain 
2 P,, > (1 ae "aa a (1 a sj 


from which (14) follows immediately. 

To establish (15), we take the a; sequence (10). Solving for 8 from (12) and 
substituting in (13), (11) we obtain 
(19) s = (e* — (1 — 26))/(24), 
(20) 1 — P, = [(1 + (2/¢) sinh’ g)” — 1)/(e*"* — 1), 
where @ = y/(2n). Eliminating ¢ from (19), (20) we shall obtain an expression 
for 1 — P, which majorises Q,(s). 

If two positive quantities c, d satisfy c/d S 1, then it is certainly true that 
c/d S (ec + 1)/(d + 1). We thus deduce from (20) that 

1 — P, S (1 + (2/9) sinh’ ¢)"e*"* 
Let us restrict ourselves to the range 
(21) 0s¢8 04. 
Now, 
sinh¢ = ¢ + R, 
where 
R; < gig” ™. 


Thus 


1 + (2/$) sinh’¢ = 1 + 26 + Re, 


where 
(22) Rz < (2/)(2Rid + Ri) < 0.6811¢", 
in view of (21). It also follows from (21), (22) that 
085 26+ Rh 51, 
so that 
log (1 + 26 + Rr) = (26 + Re) — 3(26 + Re)’ + Rs 
2p — 2 + R 
where 
|Rs| < 3|26 + R,|’, 
\Ra| < |Ro(1 — 26) — 3R2 + R, 
< R. + 3R: + |R;I 
< 3.83¢". 





KOLMOGOROV-SMIRNOV TYPE TEST 
Thus 


(23) 1- Reet, 


Turning now to the relation between ¢ and s, we note that if the function of ¢ 
in the right-hand member of (19) is denoted f(@), then f is monotone and 


o/(1+¢) Sf(o) S¢. 
Thus 


8S¢8 3/(1 —8). 


Relation (15) now follows immediately from (23), (24). It is easily shown that 
condition (21) ensures that 0 S s S 0.31. 


REFERENCES 


[1] Z. W. Brrnspaum AND Frep H. Tinary, ‘One sided contours for probability distribu- 
tion functions,’’ Ann. Math. Stai., Vol. 22 (1951), pp. 592-596. 

[2] Z. W. BrrnBaum anp R. C. McCarry, ‘‘A distribution-free upper confidence bound 
for Pr(Y < X), based on independent samples of X and Y,’’ Ann. Math. Siat., 
Vol. 29 (1958), pp. 558-562. 

(3] H. E. Danre.s, ‘The statistical theory of the strengths of bundles of threads,’’ Proc. 
Roy. Soc. London, Ser. A, Vol. 183 (1945), pp. 405-435. 

[4] A. Erpevyr, et al, Higher Transcendental Functions, Vol. 3, McGraw-Hill Book Co., 
New York, 1953. 

[5] A. Dvorerzxy, J. Kierer, anp J. Wo.rowirz, ‘‘Asymptotic minimax character of 


the sample distribution function and of the classical multinomial estimator,” 
Ann. Math. Stat., Vol. 27 (1956), pp. 642-669. 


[6] Rona.p Pyxs, ‘““The supremum and infimum of the Poisson process,’”’ Ann. Math. Stat., 
Vol. 30 (1959), pp. 568-576. 

[7] A. Waup anp J. Wo.rowrrTz, ‘‘Confidence limits for continuous distribution functions,” 
Ann. Math. Stat., Vol. 10 (1939), pp. 105-118. 





SOME EXTENSIONS OF THE WALD-WOLFOWITZ-NOETHER 
THEOREM 


By Jarostav HAsEK 


Mathematical Institute of the Czechoslovak Academy of Sciences 


1. Summary and introduction. Let (R,»,---, R,w,) be a random vector 
which takes on the N,! permutations of (1, --- , N,) with equal probabilities. 
Let {b.,, 1 SiS N,, v2 1} and {a,,, 1 Sis N,, v= 1} be double se- 
quences of real numbers. Put 


N, 
(1.1) S, = > bitnn,, - 
t=1 


We shall prove that the sufficient and necessary condition for asymptotic 
(N, — ~) normality of S, is of Lindeberg type. This result generalizes previous 
results by Wald-Wolfowitz [1], Noether [3], Hoeffding [4], Dwass [6], [7] and 
Motoo [8]. In respect to Motoo [8] we show, in fact, that his condition, applied 
to our case, is not only sufficient but also necessary. 

Cases encountered in rank-test theory are studied in more detail in Section 6 
by means of the theory of martingales. The method of this paper consists in 
proving asymptotic equivalency in the mean of (1.1) to a sum of infinitesimal 
independent components. 


2. Three lemmas. Consider a sequence U,, --- , Uw of independent random 
variables each having uniform (rectangular) distribution over the interval 
(0, 1]. Let R; be the rank of U;, ice., 


(2.1) U; = ZR; ; 


where Z; < --- < Zy is the sequence U,,---, Un, reordered in ascending 
magnitude. 


Take a nondecreasing sequence a; S --- S ay of real numbers and put 
(2.2) a(A) =a; for (¢-—1)/N <ASit/N (isitsN). 
< ay. As 


The function a(A) will be called a quantile function of a, S --- 
(¢—1)/N < i/(N +1) < i/N, we have 


(2.3) a; = a(t/N) = aft/(N + 1)}. 


Furthermore, 


t=tda= [ana and e= 52 (a; — a)’ 


t=l 


(2.4) 1 
= [ fa - ata 


Received Junuary 12, 1960; revised August 13, 1960. 
506 





WALD-WOLFOWITZ-NOETHER THEOREM 


Lemma 2.1. 


(2.5) Bl avs) —a le 2 max ja; — 4| — x2 > (a; — a), 
1sisn i=l 


where the function a(-) is given by (2.2). 
Proor. If Z, , --- , Zw are fixed, then U; takes on each of the values Z; with 
probability 1/N, and U, = Z; is equivalent to R, = i. Therefore, 


Ela(U;) — a(R,/N)? = EE{{a(U;) — a(R,/N)P | Z1, «+ , Zw} 
(2.6) N 
= (1/N) EQ |a(Z,) — a(i/N)F. 
Now, first, consider a special quantile function 
(A — (k/N)] =0 if ASk/N 
1 if A>kK/N. 


The quantile function «(A — (k/N)] corresponds, obviously, to the sequence 
@ =: = aq = 0, Qi = --- = ay = 1. Let K denote the number of the 
U,’s smaller than k/N. Clearly, Ze < k/N < Ze4,.If K S k, we have 


eZ; — (k/N)] — f(t — k)/N]) = 0 ift =1,---, K,k+1,---,N 


(2.7) 


(2.8) 
=} otherwise 


so that 


(2.9) > E (z, a ‘) -— (: > NT a- 


We can easily see that (2.9) also holds for K = k. The result (2.9) together with 
(2.6) gives 


k R; —k ‘ ai 1 


The distribution of K is, obviously, binomial with mean value k and variance 
k{1 — (k/N)], so that 


E\K — k\ Ss [E(K — k)*}' = [k{1 — (k/N)}}', 
and, therefore, 
(2.11) E{e(U; — k/N) — (Ri — k)/N}? S N {kil — (k/N)}}}. 


Now let us only suppose that a; = 0, and otherwise the sequence a; S --- S 
ay can be arbitrary. The quantile function of any such sequence may be ex- 
pressed in the form 


(2.12) a(A) = > (Ge41 — Oe) € [A — (k/N)] (qa =0,0<AS1). 





508 JAROSLAV HAJEK 
Actually, e.g., for \ = i/N, we have 


N—1 


i—1 
a(i/N) = aX (Qu41 — ax) €[(i — k)/N] = 2» (Qi41 — a) = a; 
(isi 


Now from (2.12) it follows that, first, 


N—1 N—1 


Ya! = 2D (aess — ae) (as41 — a5) 2 € [(¢ — )/N] €[(é — 5)/NI 


N—1 N—1 


= >) ze (@e41 — Ox) (Aj41 — a;)(N — max (k, j)), 


k=l j=l 


(2.13) 


and, second, 


| (2) — a (s | = = (dy41 si ay) (Aj41 ~— a;) 


k=l j=l 


[@-9)- (Gr 9) -- Gr] 


Because ¢|Z, -- (k/N)] — el(¢ —k)/N] and eZ; — (j/N)] — c(t — j)/N] 
take on only the values 0 and +1, we have 


L(x) - «GDI &) - Gr) 
ax (kj — « (5= ee oD) 7 


(2.14) 


(2.15) 


On combining (2.14) (2.15) and (2.11), we get 


[avo ~=Q)] = elem -( 
N N ta : N 


N—1 N—1 


» Zz (Qe41 — O)(Aj41 — a;) 


kel j=l 


_ max ei) ba € — max a) 7 
SO RS ae 


- > (Gras ax) (a4, —a;)E E (u; a mex 6) a (Basar) ] 


N—1 N—1 


N N 


k=l j=l 


N—1 N—1 


dy dy (eis — Oe) (Aj41 — a5) x | max (kj) © a <a aa 


kel jel 


N—1 N—1 


Fy Dy (aes = ae) (Qs2 — a)[N — max (k,/)] 


k=l j=l 





WALD-WOLFOWITZ-NOETHER THEOREM 


Ss we - (Geass <i ay) (aj - a;) 
N—1 N—1 ; 
: a » (Qe41 — Gx) (jz. — aj;)(N — max (e4))} : 


k=l j= 


According to (2.13) and to 
N—1 N—1 . 
2 & (Qe41 — x) (Qj41 — a5) = aw 
the last expression equals N~’ay( >-71 a:)'. This means that, for a = 0, 
ar. es * 
(2.16) E J acu) —a (®)| < wo |> a; 


t=—1 


Generally, for an arbitrary a; , we have 


(2.17) B| a(v1) —a (2) < 3 (ay — a) [> (a; -—a *] 
 # A Rok ty, aa. le 
s 


On making use of the values —ay S --- S —a, instead of q 
we also get from (2.17) that 


(2.18) E | acu) —a (@) | < V (ay — a) [> (a; — ar) | . 


It is now easy to derive (2.4). Let us put 
a*(r) a if a(A) S$ 4 
a(d) if a(A) 24 


- S ay, 


a(A) —4@ if a(A) Ss 
0 if a(A) 24 
We then have 
a(X\) = a (A) + a@ (A). 


and, in view of (2.17) and (2.18), the following inequalities are clear: 


ne ian OT ce shan aM Ts cau la oe _(R\F 

[acu - o(@)] s 28[ atu - ot (®) + 26[ en - « (®)] 
2(ay — aid, (a; — a4)*|' + 2(4 — a)[ 22 (a — a)" 
2 max |a; — | {[>> (a; — 4)*}' + [2 (a; — a)*}*} 


a;24 

N 4 

S 2 max |a; — 4| E > (a; — a). 
1sisn i=1 


This completes the proof. 





510 JAROSLAV HAJEK 

Let us have a nondecreasing quadratically integrable function g(A),0 <A < 
1, and put 
(2.19) gr(A) = oft/(N +1)] if (¢-—1)/N<ASZW/N. 


Lemma 2.2. The functions gy(d),0 <  < 1 are uniformly (N = 1) integrable 
and 


(2.20) lim " few(d) — o(r)}/ ddA = O. 
0 


No 


Proor. Suppose that ¢(0) = 0 so that both ¢(A) and ¢*(A) are non-decreas- 
ing. Let A be a subset of [0, 1] and u(A) its Lebesgue measure. Put 


I, = ((k — 1)/N, 1/N) 


and J, = (k/N — un(AN I), k/N) and note that gy(A) = o(k/(N + 1)) for 
Ne J, and gw(A) S g(A) for k/(N +1) SA < k/N. It holds, obviously, 


(2.21) [ | Oud) dd = wlA NM Ta)e(k/(N + 1) S OV + DE [ e(n) ad 


so that, on making use of the first right hand expression for k S 3N/4 and of 
the second one for k = N/4, we get 


a 
(2.22) [ &o dy = > | gw(d) dd S ¢'(3/4)u(A) +4 / g(r) dr 
A k=l Y ANTE B 

where u(B) = u(A). Inequality (2.22) clearly proves uniform integrability of 
the functions {gy(A)}. A general 9(\) may be written in the form (A) = 
¢i(A) — go(1 — A), where ¢:(0) = 0, g2(0) = O and both ¢,(A) and ¢(A) are 
non-decreasing. This completes the proof of uniform integrability. 

In order to prove (2.20), let us observe that gy(X\) — ¢(A) on the set of con- 
tinuity points of (A), which, however, has Lebesgue measure 1. Convergence of 
gn(A) to g(A) almost everywhere, together with uniform integrability of the func- 
tion gy(A), implies (2.20). The proof is completed. 

Lemma 2.3. Let 1, ---:, Cw, di, ++: , dw be arbitrary real numbers and put 


@é=N "> *1¢,,d =N” >-v,d;. Then 


var (> C; an.) = vo > (c; — @)? > (d; — d)’ 


t=1 


(2.23) , i e 
s—— De —2 Didi. 
N-1% i=l 
The proof is immediate. 


3. Asymptotic equivalency in the mean. Random variables S, and T, will 
be called asymptotically equivalent in the mean (symbolically, S, ~ T,) 


(3.1) lim E(S, — T,)*/(var T,) = 0. 


vrnw 


The relation is symmetric, and, if S, ~ T, and T, ~ V,, then, clearly, S, ~ V,. 





WALD-WOLFOWITZ-NOETHER THEOREM 511 


Let us take a sequence of independent random variables U,, U2 ,--- each uni- 
formly distributed over (0, 1] and denote the rank of U; in the partial sequence 
U,,---, Uv, by Ri, 1 SiS N,, v2 1. The partial sequence Ui, --- 
Uy, , reordered in ascending magnitude, will be denoted by Z,,---, Zw, . 

The distribution of S, given by (1.1) does not depend on the ordering of the 
a’s. So we may suppose that 


(3.2) GS -°: S Gy, (v2 1). 
We shall assume that the a’s fulfill the condition 


max (a,; — d,)* 
li 1<isN 


(3.3) a So ©, 


yo 
z= (a,; _ a,)” 


i=] 


THEOREM 3.1. Under the assumptions (3.2) and (3.3) the statistic S, given by 
(1.1) is asymptotically equivalent in the mean to the statistic 


Ny 


N, 
(3.4) ‘A = Zz: (b,; a b,)a,( U;) + 6,>> Ay: ; 
t=] 


where a,(-) denotes the quantile functions of a, S --- S ay, given by (2.2), 
and b, = Ns") ace by. ‘ 


Proor. On making use of (2.1) and (2.3), we may write 


(3.5) S, =A T, _ > (by: = b,) | a(n.) — a, () ° 


N 
As is well-known, the distribution of the ranks (Ry, --- , R,w,) is independent 
of the vector (Z,,,---, Z,w,). In view of Lemma 2.3, where we put 


c= by: ss b, 


i=l] 


and d; = a,(Z,;) — a,(i/N), we can write 


E{(S, — T,)’ |Zn, +++, Zw,} = var {S, — Ty |Zn, +++, Zon,} 


rae a b, — 6,)* b> E (Zn,;) — a, (2) 


ik N, 


wo4 x (b,; — 6,)’ y [aww )- a, (Rs yy. 


The first equality in (3.6) is ensured by E(S, — T.|\Z1,-°*:, Zw,) = 0 which 
follows from >. (b,; — 6,) = 0. Taking the mean value over Z,,---, Zo, 
on both sides of the inequality _ , we obtain 


a a al 7 \2 R, 
(37) E(S,—T7,)?s o Sz X (oni — 5)" > Bl au Ri a (F) | 
Clearly, 


(38) E | (0) a (+) -E Ee i. (+) (1 <i<N,,r21), 





512 JAROSLAV HAJEK 


so that (3.7) may be put in the form 
N R 2 N, ie 
(3.9) E (S, om 7.¥ S = 7 E | an(Us) 7 a, ( =) 7 (d,; = b,)°. 
N, —_ 1 N, t=] 
On the other hand, it follows from (3.4) that 
N, N, 
(3.10) var T, = 1/N,)>. (a; — 4)" >, (0; — 6,)’. 
t=1 t=] 
Making use of Lemma 2.1, (3.9) and (3.10) yield 
as 2 9 T ui 
(3.11) a ts < pee eels 2 8 
» N, | ip a)*| ‘ 


i=] 





Consequently, the relation (3.1) follows from (3.11) and (3.3). The proof is 
completed. 


THEOREM 3.2. Let the function ¢(d) be non-decreasing, non-constant and qua- 
dratically integrable. Suppose that 


(3.12) lim N, = o«. 


ve 


Then the statistics 

(3.13) S, = ¥ buelBn/(Ne + 1)] 

and 

(3.14) 1, = 3 (bu — b)e(U.) + 6,3 oli/(Ny + 1)) 


are asymptotically equivalent in the mean. 
Proor. If we put 


(3.15) a,; = gft/(N, + 1)], 


Y 

the quantile function of a,, S --- S a,y, will equal gy,(A) expressed by (2.19). 
According to Lemma 2.2, the functions ¢y,(A) are uniformly integrable and 
hence 


t/N, 
(3.16) lim N>* max |a,,\> = lim max / gw, (rd) dd = 0. 
( 


y—> 00 isis’, y— 0 ISisN, i—1)/N, 


On the other hand, from (2.20) and from non-constancy of ¢(A), it follows 
that 


(3.17) lim = x (a,; — d,)* = [ | oo) [ #@) as | dy > 0. 
0 0 


v—s N, t=1 


Relations (3.16) and (3.17) imply that the sequences a,, S --- S apy, fulfill 





WALD-WOLFOWITZ-NOETHER THEOREM 513 


condition (3.3). Therefore, in view of Theorem 3.1, the statistic (3.13) is asymp- 
totically equivalent to the statistic 


(3.18) e, « (be: — 5,) on,(Us) + 6, olt/(N, + 1)). 


It remains to show that (3.18) is equivalent to (3.14). However, as is easy to see, 
1 
2 


bai a [ | oo) — [ ea) ae] an 


so that T, ~ T;, is a consequence of the assumption (3.12) and of Lemma 2.2. 
Theorem 3.2 is proved. 


(3.19) 





4. Necessary and sufficient condition for asymptotic normality of S,. If 
S, ~ T,, then obviously, the asymptotic variance and the asymptotic dis- 
tribution of S, and 7, exist under the same conditions, and, if they exist, are the 
same. Thus the problem of the asymptotical distribution of S, is reduced to the 
problem of the asymptotical distribution of T, . The statistic T, , however, is a 
sum of independent addends, so that, if these addends are infinitesimal, it 
suffices to use well-known theory [11]. 


THEOREM 4.1. Let us suppose that 
max (a,; — d,) 
Isis, 


(4.1) lim - eee 
bP (ay; cad a,)* 


t=l 


2 


and 
max (b,; — 6,)’ 


(42) lim $476%+_____ 
> (by: os b,)° 
t= 
Then the statistic (1.1) has an asymptotically normal distribution with mean value 
ES, and variance var S, if, and only if, for any r > 0 
(4.3) lim 1/N, DoD. dn; = 0, 
vro [8ygg1>7 


where 





(by: — by) (as — G) ; (1 Si,jSN,,»2 1). 


N, Ny 4 
li y 3 (b,. ig 6,)° 2d (ay; os a.)*| 


1Vp i=l 


(4.4) 6,53 = 


Proor. Assuming that a, S --- S a,w~,, then from (4.1) and Theorem 3.1 
it follows that (1.1) is asymptotically equivalent in the mean to (3.4). There- 
fore, it suffices to show that, under the additional assumption (4.2), the condi- 
tion (4.3) is necessary and sufficient for the asymptotic normality of T, with 





514 JAROSLAV HAJEK 


mean value ET, and variance var T,. The assumption (4.2), however, implies 
that the addends in (3.4) are infinitesimal, because 


max var [(b,; — 6,) a,(U;)] 


1sisNv 





var T, 


max (b,; ree bY" Ss (ay; — da max (by: -_ b,)? 


_ isisy, N ly i=l _ isisy, 


> (by: "7c 6,)° N, = ry _ a,)* > (by: -_ b,)? 


t=] y t=] t=] 


Consequently, we have to prove that (4.3) coincides with the Lindeberg condi- 
tion for T,, namely with 


(46) lim>: 


y>o iml Var 7 |z|>r(varT,)? 


, { Cd»: top b, )a,( Ui) < x} = Q. 


Clearly, we have 


1 


sarT Sieisorurrye 2 oP {brs — br)a(Us) < 2} 


* [(b — 6,)° Py, (ai — 4 
18a Sle a)t 


, 


where 


(48) E,; ={i: \(b,; — b,)(a,; — a,)| > | ss (b,; — 6,)? s (a,; — 4,) ‘yh. 


N, t=] t=1 


Now, observe that 


> [(be: — 6)? » (ars — b,)'] 


(4.9) = >» ieee 


N, vijl T 
, : > (b —b, ) - (a,; ia a,)” 


t=] j=l 


so that (4.3) is actually equivalent to (4.6). The proof is accomplished. 

The condition (4.3) is symmetrical in the a’s and the b’s. In applications, how- 
ever, the a’s and the b’s often play a somewhat different role. In such cases the 
following theorem is useful: 

THEOREM 4.2. A double sequence {a,;,1 S i S N,, v 2 1} satisfying the con- 
dition (4.1) fulfills the condition (4.3) for any double sequence {b,,,1 Sis N,, 
v = 1} satisfying the condition (4.2) if, and only tf, 





ky 
k max  )) (ai, — &) 


2 


(4.10) 


oe. he lim 1S#1<-+-<tkeSN, @=l 
lim =Qi=> 2 > 


poo N, y—> 0 Z 
Zz. (a,; sed a,)° 


t=1 





WALD-WOLFOWITZ-NOETHER THEOREM 515 


Proor. First, let us prove that (4.2) and (4.10) implies (4.3). Put, for a fixed 
¢ > 0, 


a. ‘ FX (bn — BY 1 N, j 
sas — er d,) # . max (0, a b, Sy N, 2 (ay; a d,) 


Isis, 


(4.11) 


and denote the number of elements in E, by k,. Clearly 


N, 
r 2. (b — b,)” 
t=1 : 
max Le _ a ) z >> (ay; a,)’ < 2» (a,; d,) <> (a,; 
1sisN, 


i.€., 


max Cb — 5,)° 


by < —2 “ae 


ky 
N, 


T 4 ennai 


s i = 


t=] 


> 


from which, in view of (4.2), it follows that 


(4.12) lim (k,/N,) = 0. 


et) 


Relation (4.12), according to (4.10), = that 


2, (ar — 
jeBy 


(4.13) im 5 


vera a 
- (a,; . a,)° 


j=l 


Now, from (4.8) and (4.11) it follows that E,; C E, and, consequently, 


x | (bn: — b,)” 7 (a,; a a) | 


(4.14) = ees 


> (bi — 6)" > (a,; — a,)’ 


Finally, (4.3) is an obvious consequence of (4.9), (4.14) and (4.13). 

Second, let us assume that (2.10) does not hold. Then there exists a sequence 
B, of sets of integers such that, first, B, C {1, --- , N.}, second, the numbers of 
elements in B,, say l,, satisfy the relation 


(4.15) lim (1,/N,) = 


vrw 


and, third, 
y (a,; a a,)” 


(4.16) lim sup 47» > 0. 


a z (a,; — a,)” 


j=l 





JAROSLAV HAJEK 


If 


( r sy \t Ne \ 
(4.17) C, =< j:(a,; — a,)* > See d. (a, — )”>, 
\ 4Vy i=l 


* ‘ 
and C, = {1,2,---,N,} — C, then, clearly, 


ze , (aj — a,)* 
ed < 
7. (a, 7 a,)° 


i=] 


and, therefore, according to (4.15), 


(a,; ie 


. B,nc, 
lim 7* a 


ee : (a,; —d4 


i=] 


Consequently, in view of (4.16), 


>, (a; — @&)° > (a,; — 4,)’ 


jgeCy 


(4.18) lim sup — = lim sup eee ’ > 0. 


7 p> (a,; — a,)° ba (a, = a,)° 


i=1 t=] 


Now, put 

(4.19) ba = +++ = bm, = bm ti = °°: = bw, = 0, 
where n, is determined by 

(4.20) < (N,/l)' <n, +1 

We have, obviously, 


(4.21) lim (n,/N,) = 0 


vere 
and, in view of (4.15), 


(4.22) liimn, = © 


vere 


Furthermore, 


(4.23) b, = n,/N,, 


(4.24) >> (b,; — 6)? = [n.(N, — n,)|/N,, 
t=1 


max \ (1 —- — 


(Ny - tr) 
N, 





WALD-WOLFOWITZ-NOETHER THHOREM 517 


Consequently, the relations (4.21) and (4.22) ensure that the condition (4.2) 
is satisfied. 
The sets E,; , given generally by (4.8), will be now for 1 S 7 S n, defined by 


f 


(426) Ey, = Ex, = {J :(ay; -—4,)>r aroate op eine Bikes = a) (1 Sicgn,). 
\ N = vy t=] 
From (4.20) it follows that, for v sufficiently large, the set C, given by (4.17) 
will be included in the set EH, given by (4.26).Therefore 
Ny, 

1 3 . [Cbs b,)? > (a,; oe a,)"] 

+ by = 4 
aes > ( (by ae b,)° p (a,; Es a,)° 

j=l 


i=] 


2 
> [( by fl “hy » (aos —d »)'] (1 _ a) Ny 7 (a,; Th a,)° 


son, itty vi ee jek, o 


x ine b,)° x ( (a,; — a :)* Ny (1 - r ) > > (a,; — a,)° 


i=l j=l j=l 


De (a0 — a,)’ 
(1 in *) ac eciveccnsibas 
N,/ & 


a (a,; ei a,)° 


j=l 


The relations (4.27), (4.18) and (4.21) imply that (4.3) cannot hold, and the 
theorem is the hy proved. 

Let G,(x) denote the distribution function of the numbers a,, S --- S ay,, 
1.€., 


number of the a’s smaller or equal to z 





(4.28) GAz) = 
Lemma 4.1. Cundition (4.10) may be expressed in either of the following three 


forms: 
(i) The functions [a,(r) — 4,|'{ fola,(’) — 4) dd}~ are uniformly integrable. 


(ii) 
[a,(A) — a,}’ dd 
(4.29) i 0] = | lim ([" +f ) 


7° [ [a,(a) — a,}* dd 


=0 





(iii) 
(4.30) flim K, = ~] =| tim 3, f ad (2 — 4,)’ dG,(x) = o|, 


v->00 poo Oy 


where 


(431) = 2S ai— a) = [ a) — ahaa [ (2-4) dae), 


N, t=l1 





518 JAROSLAV HAJEK 


Proor. If a, S --- S ay, then surely 


ky Kyi Ny 
9« f = \2 = \2 = \2 
(4.32) max ), (a, — 4) = >, (a; — 4)? + >, (ax — 4G)’, 
lsij<-++<igysNy, a=l i=l i=N,—kyo 
where k,; + ky» = k,. On the other hand, we have 


Ny 


2 (a, — a,)° + b — a,)° 


i=l i=N y—kyo 
(4.33) 


ky Ny 1 
_ N, ([ + ) [a,(d) = a,]” dy, 
0 1—(ky2/Ny) 


which proves the equivalency of (4.10) to (ii) and, of course, to (i) as well. 
Now we shall prove the equivalency of (4.10) to (4.30). Clearly, 


(4.34) Af (x — a,)’ dG(xz) = 2 i om (ay; — a)’. 


» z—a,|>Kyo, a N vy |ay;—@,|>Kye, 

Denoting the number of a’s such that |a,; — a,| > K,o, by k, , we have the fol- 
lowing form of Tchebychev’s inequality: 
(4.35) k,K; < 1 o, ms , e s ( yi res a,)” = N, ° 
Assume, first, that (4.10) holds. Then K,— © implies, in view of (4.35), 
k,/N, — 0, so that, according to (4.10), the right side of (4.34) tends to 0, 
and, consequently, also the left side of (4.34) tends to 0. Thus (4.10) implies 
(4.30). 

Assume, second, that (4.10) does not hold. On repeating the respective part 
of the proof of Theorem 4.2, we get again the relation (4.18), which is equivalent 
to 


(4.36) lim sup z / (z — a,)° dG,(z) > 0. 


Cy, z—a,|>(N,/1,) te, 


This means that (4.30) is not satisfied for K, = (N, /l,)*. So the negation of 
(4.10) implies the negation of (4.30), i.e., (4.30) implies (4.10). Lemma 4.1 is 
proved. 

Corotuary. The statistic (3.13) ts asymptotically normal with mean value 
ES, and variance var S, for any double sequence {b,;, 1 SiS N,, v2 1} 
satisfying (4.2). 

Proor. By Lemma 2.2 and Lemma 4.1 the numbers a,; = ¢{i/(N, + 1)] 
fulfill the condition (4.10). It suffices to apply Theorem 4.2. 

EXAMPLE 4.1. If the b’s are given by (4.19), then the statistic 


N, — 
(4.37) S, — >, beitrn,, — >, Gober 


i=l i=l 


’ 


represents a sum of n, elements selected by simple random sampling from the 
population {a,,, --- , @w,}. The condition (4.2) is fulfilled, according to (4.25), 





WALD-WOLFOWITZ-NOETHER THEOREM 


if, and only if, 
(4.38) lim n, = lim (N, — n,) = @. 


Hence, provided that (4.1) holds, the distribution of S, is asymptotically normal 
with mean value #S, and variance var S, for all n, satisfying (4.38) if, and only 
if, the a’s satisfy the condition (4.10). 

As we shall see in Section 5, (4.10) is fulfilled, for example, if the populations 
{@1, °** , Qyw,} have uniformly bounded excesses. See also [9] and [10] for fur- 
ther results concerning sampling from a finite population. 


5. Comparison of various conditions. First let us introduce the following 

Noration. The condition (4.1), introduced by Noether [3] and simplified by 
Hoeffding [4], will be denoted by N. 

The Lindeberg condition (4.3) will be denoted by L. 

The condition (4.10) will be denoted by Q. 

The Wald-Wolfowitz [1] condition 


i 
N 7. (a, ae d,)" 


4Vy i=l 


1 2 s 
E 2d (a,; md d, ) | 
where O(1) denotes uniform boundedness, will be denoted by W. 
The Hoeffding [4] condition 


(5.1) — = O(1) (r = 3,4, --- 


N, Ny, 
Ss y (a,; — a,)’ 2d (b,; — 6,)" 
(5.2) lim N;’ = — 


N, 


: = N, 2a ne 
ie P (dy: — a,)* p ae (b,; ae 5)" 


i=l t=] 


= 0 


will be denoted by H. 

Observe that the conditions L and H concern {a,;,5.;,1 SiS N,,v21 
whereas the conditions N, Q, W are applied to each double sequence {a,;, 1 
is N,, v2 1} and {b,,, 1 Si S N,, v = 1} separately. The fact that {b,;, 
1SisN,, v2 1} satisfies N and ja,,, 1 SiS N,, v2 1} satisfies Q 
will be denoted by NQ, and the symbols NN, NW and WW will have similar 
interpretations. 

THEeorEM 5.1. WW>NW=>H, NQ>L=>NN. 

Proor. For WW => NW => H see Hoeffding [4], and for H = L Motoo [8]. 
NQ = L follows from Theorem 4.2. Thus it remains to prove NW => NQ, 

e., W > Q, and L => NN. 
W = Q. If we take r = 4 in (5.1) and use the quantile function form, we get 
1 
[ tao) - ata 
(5.3) ¢ ___________,, = 0(1) 


( [ tao - a? in) 





520 JAROSLAV HAJEK 


As is well-known from a theorem due to Vallé-Poussin, (5.3) implies that the 
functions [a,(A) — 4,] Sola,(d) — d,)d\ are uniformly integrable, which is 
equivalent to Q (see Lemma 4.1). 

L => NN. This fact follows from the inequality 


max (b,; — 
A 1sjS'> 


max (a,; — 


N 
1sisN, 


(5.4) 7 (a,; = a,)” ee (dy: 


i=] t=] 
1m : 1 . 
o> mx £4. 5¢+ <= 2.2. Ss (e>0), 
N, i=l 1Sj;5N> N, l8,¢;|>€ 


where the a’s can be replaced be the b’s. 

Remark 5.1. The condition (5.3) means that the excesses are uniformly 
bounded. Denoting this condition by W,, we have W > W,=> Q. 

In [7] Dwass considered the empirical distribution function G,(x) of the values 
a,,,°**, Qs, and supposed that, first, lim. G,(x) = G(x) at every con- 
tinuity point of a distribution function G(z) and, second, 


[ caG,(z) [ acz) 0 
(5.6) [2 dG,(x) [z dG(x) = 
These assumptions imply that 


x’ dG, = o|. 


flim K, = o|]=> tim f 

yo yw |z|>K 
Consequently, by Lemma 4.1, the a’s fulfill the condition Q, so that the respec- 
tive part of Dwass theorem [7] is contained in Theorem 4.2. 


¥ 


6. A special case encountered in rank order test theory. In rank order tests 
theory there are used locally most powerful tests based on statistics of the form 


N, 
(6.1) S, = >> b,,E{e(U,) | Rd 
t=—l1 


where E{- | R,;} denotes the conditional mean value under the condition that 
the rank of U; among the observations U; , --- , Uy, equals R,,; . Let us observe 
that (6.1) is a special case of (1.1) for 


(6.2) a, = Eje(Ui) | Rn = } = Ele(Z,.)}, 
where, in the middle expression, the index 1 might be replaced by any index 
j=1,-:-,N,. 

For simplicity we shall suppose that, as in previous sections, the U’s have , 
a uniform distribution. This causes no loss of generality, since arbitrarily dis- ‘ 
tributed observations may be expressed as (non-decreasing) functions of uni- 





WALD-WOLFOWITZ-NOETHER THEOREM 


formly distributed observations. We shall also assume that 


1 
(6.3) [ ¢g(A) dA = 0 
0 


and that 


1 


(6.4) | g(r) dy < @, 


0 


The assumption (6.3), however, is not essential. The integers N, will be assumed 
to tend to * monotonically, i., N, S Noi, v 2 1. 
Lemma 6.1. Put 


(6.5) Y, = Y.(Rn) = Efe(Ui) | Ra} 
and assume that (6.3) and (6.4) hold. Then 
(i) 


(6.6) Pilim Y, = ¢(U,)} = 1 


rren 


’ 


(ii) {¥i:, Ye,---, ¢(U;)} is a martingale and {Yi, Y3,---, ¢(U,)} isa 
semimartingale, 

(iii) the random variables Y3(v = 1) are uniformly integrable, 

(iv) 
(6.7) lim Z| Y; — ¢(U,)| = 0 


| 
pw 


and 


(6.8) lim EY; = Eg’(Ui). 

Proor. The Borel fields 5, generated by the random vectors (Ry, ---, 
R,v,) form an increasing sequence of Borel fields. Denote by 5. the smallest 
Borel field containing Ufs,. As is well-known, the conditional distribution of 
U, for given R,, = j is the Beta distribution with p = j and gq = N —j +1. 
Hence 


yp . Ru lS fNV-GH+1 1 
(6.9) E (uv - =) = — >. IN, a 2 =, 
Nv41 N, j= (N, + 1)7(N, + 2) ~ N, 
from which follows that U, is equivalent (with probability 1) to a random 
variable measurable with respect to 5... Consequently, ¢( U1) is also equivalent 
to a random variable measurable with respect to F. . 


Since the conditional distribution of U, for fixed R,; is independent of R,. , - - - 
R.w, , we may also write 


(6.10) Y, = E{e(U1)|Rn, --> , Row,} = Efe (Ui) |5,}. 


Now we can apply the theory of martingales. The assertions (i) through (iv) 
are consequences of Doob [13], Chap. VII, Theorem 4.4, §1 Example 1, Theorem 





522 JAROSLAV HAJEK 


1.1, Theorem 3.3 (III), Theorem 4.1 s, respectively. (6.8) is a simple conse- 
quence of (6.7). 

Lemma 6.2. The numbers a,; given by (6.2) satisfy the condition Q given by 
(4.10). 

Proor. Observe that Y, takes on the values a,; with equal probabilities so 
that condition (4.10) coincides, in view of (6.3) and (6.8), with uniform in- 
tegrability of the random variables Y, , v 2 1, which has been proved by Lemma 
6.1. 

TuHeoreM 6.1. Jf {b,;,1 S i S Ni, v 2 1} fulfill the condition N given by 
(4.2) and the function ¢(X) ts non-vanishing and quadratically integrable, then 
the statistic (6.1) has an asymptotically normal distribution with mean value ES, 
and variance var S, . 

Proor. It suffices to apply Theorem 4.2 and Lemma 6.2. 

THEOREM 6.2. Under the assumptions (6.3) and (6.4), the statistic (6.1) is 
asymptotically equivalent in the mean to the statistic 


Ny 
(6.11) T, = Dd (bi — 6,)e(Ui). 


i=] 


Proor. By the method used in proving (3.9), we can show that 


Ny 
(6.12) E(S, — T,)* Ss [N./(N, — 1)]E[e(U,) — YP 2. (b,; — 6,)’. 


In view of (6.5), (6.12) is equivalent to 


Ny, 
(6.13) E(S, — T,)? S [N./(N, — 1)][Ee(Ui1) — EY? dX (b,; — 6,)°. 
Now it only remains to divide both sides of (6.13) by var 7, and to apply (6.8). 
ReMARK 6.1. Theorem 6.1 generalizes the Dwass theorem [6], and the equality 
(6.8) generalizes the respective part of Hoeffding’s Theorem 2 [5]. The condition 
of convexity of y(A) is removed, which proves the conjecture made by Dwass 
in [14], p. 358. 


7. Vector considerations. We shall briefly touch the question of asymptotical 
m-dimensional normality of a vector (S;, --- , St) where 


N, 
(7.1) S; = 2 by) Ry; 


t=] 
i.e., the b’s are fixed and the a’s depend on g = 1, --- , m. 


THEOREM 7.1. Suppose that for any constants \1, -** , Am 


Ny, 


m 2 N, 
(7.2) >» Pp Ay (ahi — a) | =e max E > (ai; — a) | 
i=1 | g=1 l<osm i=l 

where ¢ is positive and independent of v= 1. Assume that the b’s fulfill the condition 
N given by (4.2) and the a’’s (g = 1, --- , m) the condition Q given by (4.10). 
Then the vector (S}, --- , St) hasan asymptotically normal distribution with mean 
values ES? and covariances cov (S$ , &), lsgh3N,,v21. 





WALD-WOLFOWITZ-NOETHER THEOREM 


Proor. According to a theorem by Cramér [12], p. 105, it suffices to prove 
asymptotic normality for any linear combination 


m Ny m 
Ya, 8 = 3 by |= Neate | | 
g= I= 


i=l 


This will be done if we show that the numbers c,; = >~7-1 \,a3; fulfill the condi- 
tion (4.10) for any \1, --- , Am . However, in view of (7.2), we have 


a ky m 2 
¥ (ua)? YE | Erar, - at) | 
a= _ a= g=1 

N, a 


Sta; + ae se | r,(a2; — a) | 


t=] i=] g=1 
ky m ky 
Y mE ra, — a) |, = DS (ah, at)" 
a=l g=1 a=l 
7 N, = ‘ 2, N, ae 
2 =9\2 a= ~9\2 
max | *> (al; — a) | >> (a%; — a) 


lsosm i=1 


i=] 


Now it suffices to note that, according to our suppositions the values a?,; fulfill 
the condition (4.10). 


ReMARK 7.1. The condition (7.2.) simply means that all multiple correlation 
coefficients are uniformly bounded from 1. 


REFERENCES 
{1] A. WaLp ano J. WoLrow17z, “‘Statistical tests based on permutations of the observa- 
tions,’’ Ann. Math. Stat., Vol. 15 (1944), pp. 358-372. 
[2] W. G. Mapow, “On the limiting distributions of estimates based on samples from finite 
universe,’’ Ann. Math. Stat., Vol. 19 (1948), pp. 535-545. 
{3] G. E. Norruer, ‘On a theorem by Wald and Wolfowitz,’’ Ann. Math. Stat., Vol. 20 
(1949), pp. 455-558. 
(4) W. Horrrpina, ‘‘A combinatorial central limit theorem,’’ Ann. Math. Stat., Vol. 22 
(1951), pp. 169-566. 
W. Hoerrpine, “On the distribution of the expected values of the order statistics,’ 
Ann. Math. Stat., Vol. 24 (1953), pp. 93-100. 
M. Dwass, “‘On the asymptotic normality of certain rank order statistics,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 303-306. 
M. Dwass, ‘‘On the asymptotic normality of some statistics used in non-parametric 
tests,’ Ann. Math. Stat., Vol. 26 (1955), pp. 334-339. 
Minoru Moroo, ‘‘On the Hoeffding’s combinatorial central limit theorem,’’ Ann. 
Inst. Stat. Math., Vol. 8 (1957), pp. 145-154. 
Pau. Erpés anp ALFr&p R&nyv1, ‘“‘On a central limit theorem for samples from a finite 
population,’ Publ. Math. Inst. Hung. Acad. Sci., Vol. 4 (1959), pp. 49-61. 
J. HAsex, ‘“‘Limiting distributions in simple random sampling from a finite popula- 
tion,’’ Publ. Math. Inst. Hung. Acad. Sci., Vol. 5 (1960), pp. 361-374. 
B. V. GNEDENKO AND A. N. Kotmocorov, Limit Distributions for Sums of Independent 
Random Variables, Addison-Wesley, Cambridge, 1954. 
Haratp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, 1946. 
{13] J. L. Doos, Stochastic Processes, John Wiley & Sons, New York, 1953. 
[14] M. Dwass, ‘‘The large sample power of rank order tests in the two-sample problem,”’ 
Ann. Math. Stat., Vol. 27 (1956), pp. 352-374. 





THE GAP TEST FOR RANDOM SEQUENCES 
By Eve Borincer! anp V. J. BorINGER? 
North Carolina State College 


Summary. This paper is concerned with the gap test for random sequences, 
first proposed by Kendall and Babington-Smith [7], and with various extensions 
to this test. One of these extensions is the test proposed by Meyer, Gephart and 
Rasmussen [8], another is, asymptotically, a partitioning of the x? statistic of 
Kendall and Babington-Smith [7], and others are likelihood ratio tests based on 
Markov chain models. 


Notation. Consider a long chain of observations, a; , a2 ,--- , @y , arising from 
a Markov process of order vy — 1, with two states denoted by 0 and 1, and with 
positive transition probabilities p,,...,,. That is, p,,...,, is the conditional prob- 
ability of the state r, given that the preceding v — 1 states are 71, f2, °° , M%1- 
Assume the process starts in a stationary state (although this can easily be 
seen not to affect the asymptotic results given), so that the occupation prob- 
abilities P(r; --- r,-.) may be derived from the transition probabilities by the 
relation i. P(r +++ T-1)pry---r, = P(re +++ 1). 

Let n,,...,, be the number of times that r,, --- , 7: appears as a connected 
sequence within the observation chain a,,--- , ay. In the case where vy = 1 
(the case of independent observations or random binary numbers) it seems 
reasonable (so that the algebra will be tidier) to follow Kendall and Babington- 
Smith [7] and consider a ‘‘cyclic’”’ sequence in which a;,~ = a; . In this case we 
define n,,...,, as the number of times r; , --- , r: appears as a connected sequence 
in one cycle of the observation chain. The difference in n,,...,, under the cyclic 
or non-cyclic definitions is at most ¢t — 1. 

The gap test is concerned with the number of non-zero digits (all denoted 
here by 1) between zero digits, but we could easily apply the results to gaps 
between any particular class of digits, for example, between even digits. For 
random decimal digits we have v = 1 and pp = 0.1 while p, = 0.9. 

Let Ne = Meyrg-+-re4itea2 OANA Mz = Meyry.--rg,, » Where tr: = T2422 = Oandr, = -:- 
= rz4, = 1. That is, if there are at least two zeros in the sequence a, , --- , dw, 
then N, is the number of gaps of length z and, in the cyclic case, M, is the number 
of gaps of length xz or greater. 

Since we may wish to pool some classes to give ‘‘reasonably large expecta- 
tions”’, let 


8 
NK exees = SM... 


i=0 


Received January 19, 1959; revised May 26, 1960. 
1 At present at the University of Sydney, Sydney, N.S.W., Australia. 
2 At present at the University of New South Wales, Kensington, N.S.W., Australia. 


524 





GAP TEST FOR RANDOM SEQUENCES 


8 
M a.a+0 a a M+: . 
i=0 

For small values of x, s will probably be taken as zero and may always be 
taken as zero if we are concerned only with gaps of small size rather than all 
possible gap sizes. We shall, of course, choose values of x and s so that we have 
non-overlapping classes. 

Since we shall mainly be concerned with an independent sequence, we take 

3 1 
z s z e+l1 

po = pand p;, = q. Let ge = pg’ and grizts = Diino Geri = (1 — g’”). 

In what follows, >_, with no superscripts or subscripts, is taken to mean sum- 
mation over all appropriate pairs of values of x and s. That is, the summation is 
over all classes considered, which may be numbered 0, 1, --- , L. 


Formulation of the Problem. We shall consider the following possible tests of 
the null hypothesis p,,...., = p,, = p when r, = 0, where p has a specified value: 
Test A: Kendall and Babington-Smith [7] suggested using 


Xt = S (Naste — No Jraz+e) 
No Jz+z+s 

which they considered to be asymptotically (with respect to N) distributed as 
x’ with L degrees of freedom, where the L + 1 classes include all possible gap 
sizes, the last class including all those from a certain convenient finite size up 
to size N — 2. 

Test B: Meyer, Gephart and Rasmussen [8] have suggested their “strong” 
gap test using 


x3 aie 7 (Newzte — NpGere+e)” 
N PQ c—+z+8 
which they have taken to be asymptotically distributed as x* with L + 1 degrees 
of freedom. We shall show that this is not quite the case. 
Test C: We suggest that it may be more useful to base a test on 


xt = (Nevete — PM svete)” 


c= 


NPQz+2+s 
We shall show that this is asymptotically distributed as x’ with L + 1 degrees 
of freedom, and, in fact, that for each of the L + 1 classes 
(Nass+e — DMos242)” 
NN pages 
is asymptotically distributed as x* with 1 degree of freedom. 


An alternative procedure is to use 


(Nevete — PMseote)” 
POM 2.248 





526 EVE BOFINGER AND V. J. BOFINGER 


which, by Cramér [5], 20.6, is asymptotically equivalent (that is, has the same 
asymptotic distribution) under the null hypothesis. 

The advantage of test C is that we may examine each of these separate con- 
tributions, since these are asymptotically independent. It is difficult, however, 
to relate these separate contributions to likelihood ratio tests in the way that 
(as will be seen later) we can relate the total X?, . 


Asymptotic Distributions of X , X; and X¢ . Now Bartlett [2] has shown that 
the N Wis -sr, — E(n,,...,,)| for various r,, «++ , 7: have asymptotically a joint nor- 
mal distribution. Hence each of the sets of variables 


a. N*(Nocots - Nz+2+s) 
b. N (Nase — Np Goes) 
Cc. N (Neuere — M,z.2+s) 


for various finite values of x and s, being linear combinations (or very nearly so 
in the non-cyclic case) of the N*[n,,..., — E(n,,-.-r,)], for ¢ at least as large as 
the largest value of z + s + 2 considered, have asymptotically a joint normal 
distribution. We may easily see that the expected value of any one of the three 
variables (labelled a, b, or c) is zero and so we only need to find the variance- 
covariance matrix for the above sets of variables to find the asymptotic dis- 
tributions of X%, X% and X2. 
Following Billingsley [3] we let 


1 if the sequence a; , @j41, °** , Qi+e4 
is the sequence 7; , f2,°** , Tt 
0 otherwise 


and let 


(1 if the sequence a; , @j41,°°* 5 @jreyen-1 

B; = } is the sequence s;, 8, ~°**, 8:4 Where K is a finite non- 
negative integer 
\0 otherwise 


T N f N- 1 . 
Now n,,-.-7, = sie a (or > 3'* a; for the non-cyclic case) and n,,.. = 


" "St4K 
ie B; ° Hence 


N 


Cov(My,..-7; 5 Mey--a44) = > Cov(a;, B;). 


i,j 


The evaluation of this expression is greatly simplified in the case we are con- 
sidering of an independent sequence and is further simplified here, since we con- 
sider such sequences as 7, , --- , 7: With 7) = r; = Oand m = --- = re, = 1. 

We find that 

Var(N.) = Np’ (1 + 2pq* — (22 + 3)p'q") 
Cov(N,,N,) = Np’d'*"(2 — (r1+y+3)p) 





GAP TEST FOR RANDOM SEQUENCES 


and 
Cov(N,, 0) = Np'q’(2 — (x + 2)p) 
Hence 
Var(N, — ngz) = Npgz(1 — gz) 
and 
Cov(N.z — ngz, Ny — ngy) = —Npgw, - 
Hence 
Var(Navets — NGzzts) = Npgs+2+e(1 — gorete) 
and 
Cov(Nase+s — MGerzte» Nyyst — Ny yet) = —NDGe-o2+Gyy+t - 


The variance covariance matrix of the N, — pM, may be obtained by noticing 
that 


Cov(te,.--7, — DrgMry---rg_3 9 Meyeeag — Da, Moy---04_-3) 
= Nb. Ti Pr °° Pre(8e, — Pes) 
where 


(1 if ry = &,°°° Pea = Sey 


gris :re-1 sa 
yee — 


\ 
\0 otherwise. 


This may be seen by a slight modification of the work of Anderson and Good 
man [1] or by expressing the above covariance as im SC i,j), where f(z, 7) 
contains four terms of the type Cov(a;, 8;) corresponding to the four terms 
of the product 


(Sry--<rg = PrjMlirgso-4.4.) (Mog---0g — Posler---090:8)° 
We find that f(i, 7) is zero unless i = j and that 
$4, i) = BATT Dr +++ Dre(Bes — Dex)- 
Hence 
Cov(r,..-7) — Pe, Mry---4,25 5 Moya ~ PorgnMer--epgn—1) 
= NOx si Merges Dor *** Perye(Oerye — Pre) 
and so 
Cov(N, — pM.,N, — pM,) = 0 for 2 # y. 
Hence 


Cov( Ne-z+s 7 DM z.2+5 ’ Npintt _ pM y.y++) = 0 for x 4+ s < y. 





EVE BOFINGER AND V. J. BOFINGER 


z+1 


Var(N. — pM.) = Np’gq 
and hence 
Var(Nasc+s — DMa.cis) = Npggezis- 


Now we may find the asymptotic distributions of xs Xt ol Fe. 

Test A: Consider classes numbered 0, 1, 2, --- , L with associated variates 
fo, fi,fe,°** ,f, such that for 7,7 = 0,1,2, --- Lwehave E(f;) = 0, Var(f;) = 
Kp(1 — pi) and Cov(f;, f;) = —Kpip; for i ¥ j and p; is a positive number 
with Diop, = P S 1. 

We may easily show that the variance covariance matrix of z; = fd Kp.) 
has L latent roots equal to 1 and one equal to 1 — P. 

To apply this to the variates N*(Ne.c+s — ToGs+2is) We notice that 
P = 1 — q’’ if all gap sizes are included and hence, by Cochran [4], 


57 (Nessta — No Deosts) 
Npg:z —+zr+8 


’ 


(and also, by Cramér [5], 20.6, 
>: (Nasete — No Qezts) 
To Jz+z+s 
is asymptotically distributed as x’ with L degrees of freedom. 

This may also be seen by usual multinomial theory. 

If, however, not all gap sizes are considered, but only, say, those of sizes 0 to k 
in the L + 1 classes (L S k), then L latent roots still equal 1 but the (Z + 1)th 
equals q‘**. Provided k is large enough this would have a small effect, but in 
dealing with decimal numbers q = 0.9 and with L = k = 4, say, the fifth latent 
root is 0.59, which constitutes an appreciable effect. 

Test B: From the variances and covariances determined above we find that 

Var{N,/(Npg-)*} = 1 + 2pq* — (22 + 3)p'q" 
and 


( r r ) 
N N \ a i(z+y) (2 


eae ye Vy i 
iss \Wp9.)'’ (pg, ~ 4 


—(x+y+3)pl. 


First let us consider the case where no pooling takes place. That is, we con- 
sider the statistic 
L r Tet ~#\2 
(N, — Np ¢°) 
0 Np q 
We can easily find that the corresponding variance-covariance matrix has L — 1 
of its latent roots equal to 1 and, after some algebra, we find that the remaining 
2 latent roots are given by the expressions $(a@ + 8), where 


a=1+q+ (2L+3)pq™ 





GAP TEST FOR RANDOM SEQUENCES 


and 
6={1—_*)[1 + 90 — g’™) — 4(L +1) (LZ + 2) pg. 


If we pool so that the Lth class consists of all those gaps of length L or greater 
the variance-covariance matrix is modified in the following way. Let 


Nx = Ni + Niu + pia + Ny-2, Ms. = M, + Miss + —- + M y-2 


and gx = gu + grata + °° + Gwe. 

In practice, many of the components of N, will be small or zero and Nx may 
be calculated as mo — > ZUN,. 

We have considered the above method for pooling classes since it seems to be 
a reasonable one and any general pooling scheme as in Test A is too awkward 
algebraically. 

The asymptotic expected value of Ns is Npq” and so the test statistic is now 
yt (Ns — Nog)’ , (Nx — Npg’)’ 

0 Neg Npq 

We can easily show that Var{N«(Npq”)} is asymptotically equal to 1 — 
(2L + 1)pq” and that 


CoviN.(Np >, N«(Npq*) 
is asymptotically equal to 
pg?*(2 — (L+2+3)p— gd. 


It follows that the corresponding variance-covariance matrix has L — 1 of 
its latent roots equal to 1 as before, and, after some algebra, has the remaining 2 
latent roots equal to 


4(1 + g){l + (1 — 4g°*7(1 + g)*)4. 


This means that, for large L, one of the latent roots is approximately 1 + g¢ and 
the other is approximately zero. 
Test C: For each class denoted by z — x + 8, 


(Nevers — PMa-2+e) 
Npqgz—2+s 


is asymptotically independent of similar contributions from the other L classes 
and is asymptotically distributed as x’ with 1 degree of freedom. Hence X% is 
asymptotically distributed as x* with L + 1 degrees of freedom. 


Discussion of Tests A, B and C. In the following we restrict ourselves to the 
case where we consider gaps of sizes 0, 1, 2, --- , LZ — 1 and pool gaps of size L 
or greater. 

Now 


L—1 (N, — % gz)” 


y 2 L—1 2 
G4 (N, — pM.) ; 
as ee 
2,” No. * + 36 Nags 





530 EVE BOFINGER AND V. J. BOFINGER 


This can easily be seen to be true for L = 1 and may be proved by induction, 
remembering that M, = Nx. Hence X¢« is asymptotically equivalent to X% . 
Asymptotically then, X¢« is a partitioning of X% into independent x’ variates, 
each with one degree of freedom. Notice that X?+ is a modified form of X?. . 

Also we may show (using the asymptotic likelihood found by Bartlett [2]) 
that the likelihood ratio test of the null hypothesis 


Prye-reg, = P (a specified value) if rz4, = 0 
against the alternative 
Dry--tha, = Deets, MP Hts = Tin = 0 and rey; = --: = 7, = 1 for 


x = 1, 2,---, L (where these probabilities are 
unspecified ) 


1 =? (specified) ifr; = --- ,= Landr.z,, = 0, 


is given by 


L—1 r ae ) 
—2logd = 2 2 \N, log (4+) + (M, — N,) log (*: - ‘)} 


z qM, 
This may be shown (using methods similar to those used by Anderson and 
Goodman |1]) to be asymptotically equivalent, under the null hypothesis, to 
— (N, — pM,.)° 
z=0 pqM ; 


’ 


which is asymptotically equivalent to XZ» and hence to X% . 
In the case where the null hypothesis does not specify the value p the likeli- 
hood ratio test for the null and alternative hypotheses above is given by 


L-1( N,N Mi4i N | 
—2 log = 2 20)N, log & wm) hae (we om) 


: Ns N - (Ms — re) 
2N x log | —_— 2(Mx — Nx) 1 - : 
oe (x ‘i*) ee ae og ( (N — m)M, 
which, under the null hypothesis, is asymptotically distributed as x’ on L degrees 
of freedom and is asymptotically equivalent to 
« SN, - 6M). (Ne ~- OMe) 
Xp = . + = - 
ont pqM, pqM, 
where p = 1 — Gg = no/N. 
It is interesting to note that 


x? = — (N, — pM,)’ 4 (Ns = pM) _ (nm — Np)’ 


Yo 


= pqM, pqM, Np 


Under the null hypothesis 





GAP TEST FOR RANDOM SEQUENCES 


Pry--rpa, = Pp (specified) if ria = 0 


(which is the null hypothesis for the first likelihood ratio test considered) this 
is asymptotically equivalent to 

2 ,(Nx—pMs)’ (um —Np)’ _ xy: no — Np)’ 

7 4 o~ ee - \% Pp? =x? — (Mo Pp) 


4 


NPQ Npq ’ Npq 
It may be seen that X% is asymptotically equivalent, under the null hypothesis 
Pry--rp41 = Pp (specified) if rr. = 0, 


to the likelihood ratio test of this null hypothesis against the alternative hy- 
pothesis 


Dry---rha1 = Dre tea, =P Mts = fiw = O and fey, = --- = 7, = 1 for 
-, LZ. 


(Notice that this is a slight modification of the first likelihood ratio rest con- 
sidered, the only difference being that the alternative hypothesis here does not 
specify p,,....,,, Where 1 = fe = --- =r, = 1.) 

The result for the asymptotic distribution of X% may be illustrated by noting 
that X— = Fos + q(m — Np)’ ‘N pq. 

Now (no — Np)’/Npq and Xé« are asymptotically distributed as x’ variates 
with 1 and L degrees of freedom respectively. However, these two x’ variates 
are not asymptotically independent. In fact we may see that XZ may be par- 
titioned as follows: 


72 (Nx — pM +) i (N.— pM, — gz(m — Np) ? 
Ao + - = = : 
: N pgs ~ Npqgz 


(m — Np) , (Ne — pMz — ga(m — Np))’ 
- + — 
Npq N P49 


and for large values of L, and hence small values of gs , the last terms on either 
side of this equation are approximately asymptotically equivalent and the first 
term on the right hand side is asymptotically distributed as an approximate x* 
variate on L — 1 degrees of freedom, independently of (no — Np)*/Npq. 

Hence for large values of L, the asymptotic distribution of X% is approximately 
that of a x’ variate on L — 1 degrees of freedom plus (1 + q) multiplied by an 
independent x’ variate on 1 degree of freedom, this last x’ variate arising from 
the term (nm) — Np)’ / Npq. 

This explains the result for the latent roots of the variance-covariance matrix 
associated with X%.L — 1 of these latent roots are equal to 1 and, for large L, 
one is approximately equal to zero and the last is approximately equal to 1 + q. 

The approximation in the above is asymptotically 0,(q") and q” may not be 
particularly small in cases of interest. We are unlikely to be interested in ex- 
tremely large values of L. 





532 EVE BOFINGER AND V. J. BOFINGER 


Extension of Test C to the case of Dependent Sequences. We consider now a 
non-cyclic dependent sequence. It will be easier to find first the likelihood ratio 


test and then the related x? test, which may be regarded as an extension of 
Test C. 


Consider the null hypothesis 


Dry-rpas = Propet thar = P (specified) if rosy = --- = re = land 
roii = O (where yu is a fixed integer and 
0s 24s L) 

against the alternative 


Pryetpas = Proce thar AP frre = Orie = °°? = TL = Landriy = 0 
andx=yutl,::-,Lb-—1 


and 


Pri--rp4, = P (specified) ifr; = --- r= landrz4, = 0. 
Then if \ is the appropriate likelihood ratio 


—2logy\ = 2 ‘ Sig — — 
°8 a en a (2 ee 


a ri-s-ru1 \\ 
baa fe " og ( Bettas rL— a) 


and under the null hypothesis this is asymptotically distributed as x* on L — uz 
degrees of freedom and is asymptotically equivalent to 


L-1 yy? 
V5.2 


=: pqMz,1 
— PN...rp_,--rz - and 


<1 Be Nensh Neoyeerp gabe" TL 
where fr. = fra. = Oand rz»: = --: r, = 1. 

Consider the hypothesis p,,....,,; = Prz—,s1--rz41 » Where the probabilities 
are specified, and in the particular case rz,»41 = --- = rz, = land rzy, = 0 let 
us denote p,,_,,1---r,,, by p. That is, we are considering a Markov chain of order 
at most » with specified transition probabilities. 

Under this hypothesis the likelihood ratio test statistic above is asymptotically 
equivalent to 


L—1 72 
} z,L 


Si Nog P(ri—z:--11) 


’ 


which is asymptotically distributed as a x’ variate with L — u degrees of freedom. 
This is a possible extension of test C for u-dependent sequences. Also it may be 
shown that each term Y,,, / (Npq P(ri_z --- rx)) isasymptotically distributed 
as an independent x’ variate on 1 degree of freedom. 





GAP TEST FOR RANDOM SEQUENCES 533 


However, the likelihood ratio test is perhaps preferable since it is not necessary 
for the likelihood ratio test to specify, under the null hypothesis, the value of 
Pry---rz4, Where any one of rz,41, °°: , 7x is zero. 

A test that is perhaps more useful is the likelihood ratio test where the null 
hypothesis does not specify values oi probabilities. That is, we test the null 
hypothesis: 


Prureg: = Prossi-rey1 = P (unspecified) where ria = -** = re = 1 
and rz4i, = 0 


against the alternative 


Psoeg sy © Pee nis when riz = finn = O and rrp = 
= lforr=ypywtil,---,L-—1. 


In this case 


L—1 /( 
‘ ‘ J a sro 
—2 logy = 2 > {N...rp_g---r10 log 8 
zy | 


PN...rpie rh. 


Nr se-rpl | 
+ N...+,_9°--rz1 10g ( ee TL) 


QM..ry_er. 


+ 2n,,...2,0 log (S2-) 
PNs,---81. 


) 


+ 2n,,...,1 log (fe) 


where 3, = & = --: = 8, =1,rp. = 0, roe = °'' = 71 = 1. Alop=1—¢g 
= Nerep_ payee 0 / (Merz parr?) Where fipy = *°* = ry = 1. 

This is an obvious extension of the likelihood ratio test associated with X> 
and under the null hypothesis —2 log \ given here is asymptotically distributed 
as x on L — yp degrees of freedom. 

Some of the tests given above are anticipated in a statement by Goodman 
[6] on possible tests on his ,k; which is the same as our N, with x = 7 in the 
particular case 7 = 1, and evaluation of some of the variances and covariances is 
related to some work by Good [5a]. 


REFERENCES 


{1} T. W. ANDERSON AND Leo A. GoopMaN, “Statistical inference about Markov chains,”’ 
Ann. Math. Stat., Vol. 28 (1957), pp. 89-110. 

[2] M. S. Bartiert, ‘““The frequency goodness of fit test for probability chains,’’ Proc. 
Camb. Phil. Soc., Vol. 47 (1951), pp. 86-95. 

[3] Parrick BILLINGSLEY, ‘‘Asymptotic distributions of two goodness of fit criteria,’’ 
Ann. Math. Stat., Vol. 27 (1956), pp. 1123-1129. 

[4] W. G. Cocuran, “The distribution of quadratic forms in a normal system with applica- 
tion to the analysis of covariance,’’ Proc. Camb. Phil. Soc., Vol. 30 (1934), pp. 
178-191. 

[5] Harotp Cramér, Mathematical Methods of Statistics, Princeton University Press, 
Princeton, 1946. 





534 EVE BOFINGER AND V. J. BOFINGER 


[5a] I. J. Goon, ‘“The serial test for sampling number and other tests for randomness,” 
Proc. Camb. Philos. Soc., Vol. 49 (1953), pp. 276-284. 

[6] Lxo A. GoopMan, “Simplified run tests and likelihood ratio tests for Markov chains,”’ 
Biometrika, Vol. 45 (1958), pp. 181-197. 

[7] M. G. Kenpauu anv B. BasinetTon-SmitTH, ‘‘Randomness and random sampling num- 
bers,’’ J. Roy. Stat. Soc., Vol. 101 (1938), pp. 147-166. 

[8] H. A. Meyer, L. S. GepHart anp N. L. Rasmussen, ‘‘On the generation and testing of 
random digits,” WADC Technical Report 54-55, Wright-Patterson Air Force 
Base, Ohio, 1954. 





THE MULTIVARIATE SADDLEPOINT METHOD AND CHI-SQUARED 
FOR THE MULTINOMIAL DISTRIBUTION 


By I. J. Goon 
Admiralty Research Laboratory, Teddington, Middlesex' 


1. Introduction. This paper is largely a continuation of Good [5], but the above 
title is more descriptive than the previous title would be. The contents are: 

(i) Further discussion of the saddlepoint theorem for coefficients in a power 
of a power series, especially in more than one variable. 

(ii) A generalization of Fa4 di Bruno’s formula for the repeated differentiation 
of a function of a function. 

(iii) Some discussion of the relationship between moments and cumulants, 
especially bivariate ones. 

(iv) Some exemplification, but not a systematic exposition, of multivariate 
notations in analysis, which are less familiar than those used in algebra. 

(v) Corrections to the previous paper [5]. 

(vi) The results of some numerical trials of the method of calculating the 
distribution of chi-squared for an equiprobable multinomial distribution. 


2. Further Formalism and Discussion of the Saddlepoint Theorems. In this 
section I shall discuss certain formal aspects of the saddlepoint theorems given 
in Daniels [3] and Good [5]. (In order to minimize repetition, I shall assume that 


the reader has a copy of [5] ready to hand.) A part of the formalism involves 
the use of Hermite polynomials in one or more variables. When there is only one 
variable, Hermite functions are shown to be relevant, for example, by Jeffreys 
and Jeffreys [6], Para. 23.09. But that context is rather different from ours, and 
the method of proof, by partial integration, does not appear to be applicable 
when there is more than one variable. 

The formalism will shed further light on why it is desirable to make use of a 
saddlepoint of the integrand (or of a function closely related to the integrand). 

I wish to emphasize that the discussion is formal, and I have not investigated 
general conditions of validity and bounds for errors. In any specific application 
some attempt should be made to estimate the error, either analytically or by 
means of numerical experiments. 

I shall take the opportunity of correcting some slips in [5]. 

Let M be the column vector whose components are (M,, --- , M,), and let 
transposition be denoted by a “‘prime” or “‘dash”’, so that M’ is the corresponding 
row vector. If 6 is another (1-dimensional column) vector, then M’6 represents 
the scalar product M6, + --- + M,6,, in accordance with the usual notation for 
matrix multiplication, a notation that will be used more generally. Let 
c(M, t) = c(M,, --- , Mz, t) be the coefficient of z@ = zi! --- zif in (f(z))* = 


Received September 29, 1959; revised October 29, 1960. 
1T am indebted to the Admiralty for permission to publish this paper. 


535 





536 I. J. GOOD 


(f(z, ++: , 2:))', where ¢ is a positive integer, and where M;/t and t/M; are 
bounded (j = 1, 2, --- , 1). The notation x™ for a “scalar indicial” will be used 
more generally, for example, e™ denotes pi“! --- pi’. In accordance with a con- 
vention often used by physicists, dz, for example, will denote dz, dz, --- dz, . 
(The naturalness of this notation is illustrated by the suggestive notation 
dz/dé for a Jacobian.) Our main purpose in this section is to develop formulae for 
c(M, t). We start with 


1 t dz 
(1) c(M,t) = ap Po fp se) gifatt .. géit 


l 
. l E " l 16; 481) \t —M’®@ 
(2) ~ (2x) 'oM [-- [Gove »st,pre ‘))e” © db. 


Let r = (™,,--+, 71)’ be a vector each component of which is a non-negative 
integer, and let |r| = 1 + --- + 7m,r! = 7,!--- r, ! Similar notation will be 
used for s and n. 

Consider an (artificial) probability distribution of a random vector X, such 
that the probability is pa = Pn,,--..», that X = n; where p, has the probability 
generating function x “‘f(p,2,, --- , putr)/f(e). The corresponding moment 
generating function is 


eM ENE, pi a, ne 


Let the corresponding cumulants be x, = x;,,...,r, , 80 that, if |r| = 0, 


I ri ( 7 
(3) « =I] (2) ar — log f(o) + log f(p e", «++, p yl 


t ) leo 


The “order” of the cumulant «, is defined as |r|. Note that xg = 0, and that, if 

|r| = 1, the cumulants take the values k; , where 

_M; 
t 


Peay Pie + ar log f(p: e*, ++, pre’) 
t 0&; E=0 


j=l 


a 
k; = + 9; = log f(o) 


(4) 


slid oO . eee 
~ Il (2) log ft 3 . ra 
3 0g; |E=0 
= Il (. x) log f(o). 
j 


Op; 


Formally, 


(8) c(M, t) a So) | vee fepis* (io) ao 
' (27r)'oM | “p rl a 





MULTIVARIATE SADDLEPOINTS & MULTINOMIAL x? 537 


where =» means “equals formally’’. (I am here omitting the ranges of integration 
since the argument is only formal. Note that in [5] the factor (f() )‘ was omitted 
in error twice on p. 872 and twice on page 869, and the factor 1/t was omitted 
once on page 872. Also, on page 872, ‘‘1+-” in the heavy exponential should be 
deleted. These slips had no real effect on the argument or results. ) 

To evaluate c(M, t) approximately we should like to know the saddlepoints of 
f(z)z™. Certainly there is a saddlepoint at z = 9 if equations (6.20) of [5] are 
satisfied. These equations assert the vanishing of the first-order cumulants of our 
artificial distribution, i.e., they assert k = 0. We do not need to prove that there 
are no other saddlepoints provided that we can cope directly with our integral, 
(8), with respect to 6. What we do know, by [5] p. 874, is that there is at most 
one ‘real and positive” saddlepoint, i.e., that equations (6.20) (k = 0) have at 
most one solution with p, > 0,---, p: > 0. But @ = 0 is not necessarily the 
only important point in the region of integration. (There was some carelessness 
in [5] concerning this question.) In the next section we shall have an example in 
which the region of integration with respect to 6 contains two points of equal 
importance. 

Let us now continue with the formal procedure. We have 


(f(o))! ‘tke’ @-48%6" ee oe pate 
(9) c(M, t) =? ajigl | me [e k'e-4 exp {t > - (76) \ ae, 


where K is the matrix of second-order cumulants 


(10) K = (0.2 (6 2) tog s(0)). 


(I am taking the liberty of using 7 as a suffix, besidesas +/(—1).) The determi- 
nant of K is A, the Hessian of log f(e®, --- , e€') (where now & = log p,, etc.) 
and is positive (see [5], p. 874). 

If we now imagine the last exponential factor in (9) to be expanded as a 
(multiple) power series, the various terms of the integrand can be obtained by 
partial differentiation of the first exponential factor with respect to the first-order 
cumulants. By interchanging the order of differentiation and integration we get 


(f(o))' ro In "S(S)} ee ”  ithk’@-19’Ke 
c(M, t) wp LO) exp x = \aae ee [-e de, 


r 
a\" 
(ri) 


an ‘ 4 
. ,itk’@—41@' Ke i (2x) —}tk’'K~'k 
tee ¢ dé = e€ 
* - thtAi 


where 


(see, for example, Cramér [2], p. 119). Therefore 





538 I. J. GOOD 


(f(o))* ( Inlz3, d r| ~}tk’K-"k 
) at i, <a r os > 
(11) eM) ~e Gepnrgneat PE 2 3 lia) f° 


- wi 
Let us now introduce a function », , of r and t, defined by the identity 


Ir| 20 y ir m* 
Yee = ep (td Se) 


r r 


” (Hort ee gM Eth Eee’ KE 
fle) ; 


Let us also introduce a formal symbol v, to be manipulated as if it were a vector 
with | components, and an operator [---} which has the effect of replacing v‘ by 
v,. Then the part of (11) beginning “exp” can be written 


v, ( d : —htk’'K-'k v(d y ~in’K-%| 
La (4) P Nis’ La idk) ° f 


=F | exp — it (« “) K" (x + “I, 


by Taylor’s theorem in several variables. We have then 


(f(o))' T_ Bias a) sand \ 
(13) c(M, t) ~, Ont)eMAl exp — 5 (tk’ + w)K (tk + »)/ 
Now the Hermite polynomial in / variables may be defined as 


H,(x\C) = (—1)'" exp (4x’Cx) (d/dx)* exp (—}x’Cx), 
p I 


where C is an | by 1 symmetric matrix. (See, for example, Erdélyi, et al., [4], 
p. 285.) When / = 1, I shall adopt the convention 


H,(x) = (—1)"e" (d/dx)"e* = H,(x | 2). 


We have then 


biog (f(o))* ~4ne'K-™% (—1)""», bcimiadl 
(8) mr hg et SU eae) 


In particular, when 9 is a saddlepoint of f(z)z ™, so that k = 0, we have 


~ (f(o))* : l , -] \ 
(15) c(M,t) ~r (2nt) Mal exp — 5 K " . 


; (f(e))' (—1)'*'y, sie 
16 (M, t) ~p Fie tt ee He) 
( ) c( ) F (Qat)*' Ma? 2 rift! 


(It is perhaps opportune to remind the reader at this point that », depends on ?.) 
Now the Hermite polynomials have the generating function 


> o H,(x | C) = exp {} x’ Cx — }(x’ — a’)C(x — a)} 





MULTIVARIATE SADDLEPOINTS & MULTINOMIAL x? 
(see, for example, Erdélyi, et al., [4], p. 285), and in particular, 
r 4 1,/ n 
a —~ta’Ca (—}a’Ca) 
> — H,(0|C) =e - >. 
r! n= Th! 


Therefore H,(0|C) = 0 if |r| is odd, and, when |r| is even, 


7) + H,(0| C) = e(a)(—4a'Ca)"", 


where C(---) means “‘the coefficient of --- in’’. So 


(18) c(M,t) ~ _ (So) — eet & o @(a‘)(a’ K™a)*"! 
; (Qxt)oMa! + §« (3ir|)! 2t : 
in which, by the way, », = 0 when |r| = 2. 
When / = 2, and when (9, p’) is a saddlepoint (i.e., equations (6.8) and (6.9) 
of [5] are satisfied), we have, from (18), 


r (f(p p’))' = (—1)**? (=) (=) (Ss) 
19 M,N wy —__— —— Ji ——)} peses are. 
(19) (M,N,t) ~s 2atp™ p’N At hate hict7! \2ta/7 \taA/ \2ta rea 


When / = 1, and when pis a saddlepoint (i.e., equation (6.1) of [5] is satisfied), 
we have 


(20) c(M,t) ~y £60)” 3 2 /( ; y. 


op™(2xt) s= s! Det 


There are various methods for calculating the »,’s. The most convenient one 
will depend on the function f and on the computational resources. These methods 
will depend on the relationships between one or more pairs of the cumulants, 
kK, , the moments, yu,’ = E(X"), the moments un, = E(X — EX)‘ about the mean 
(which are equal to the moments when we are using the saddlepoint method 
proper), and perhaps the factorial moments, 


(21) win = (d/dx)*(x ™"f(pits , >+* , pete) /f(p)) | epee--mermt- 


Ordinary moments can be expressed in terms of factorial moments, using Stirling 
numbers of the second kind. Factorial moments can be expressed in terms of 
ordinary moments, using Stirling numbers of the first kind. (See Kendall [8], 
p. 57, and Riordan [12], pp. 33 and 48.) For the case 1 = 1, a table of relation- 
ships between moments and cumulants, up to r = 10, is given by Kendall [8], 
pp. 62-64. The »,’s can be obtained from the formulae that express the moments 
in terms of the cumulants by multiplying the cumulants by ¢ and putting those 
of order less than 3 equal to zero. In this manner I have again checked formula 
(6.2) of [5]. 

It was pointed out by Lukaes [9], that, for 1 = 1, the relationships between 
moments and cumulants can be obtained from Faa di Bruno’s formula for the 
repeated differentiation of a function of a function. When! > 1 we can either use 
the rules given by Kendall [7], or the generalization of Fad di Bruno’s formula: 





I. J. GOOD 


a\t I] fs a j 
rl (4) ¢ (¥ (x)) = Zz Ili! (2) ¢(y), 
s 
a formula that will now be explained, and then proved. 
(i) ris a multipartite number, i.e., a vector whose components are non-nega- 

tive integers. 

(ii) The number of components of ¢ need not be equal to the number of inde- 
pendent variables 2, , 22, --~* , i.e., to the number of components of x. 

(iii) y = &(x). The notation of partial differentiation on the right implies that 
y is not supposed to be expressed in terms of x before the differentiations 
are performed. 

(iv) i, is a function of the vector s and is itself a vector of dimensionality that 
of t and has non-negative integer components. 

(v) s #0. 

(vi) jis an abbreviation for ) + 3 

(vii) f£, = (1/s!) (d/dx)*y(x). 

(viii) By the time the summation sign is to be interpreted, s has already become 
a dummy variable, i.e., the summand is not a function of s. The summation 
is to be performed over all selections of the function i, for which 
> S\i,| = 1; in other words, when ¢ is a scalar function, over all parti- 
tions of the multipartite number r. 

(ix) In conformity with the notation for the factorial of a vector, [], i,! means 


IL “(1)_y (2), 
stg +tg +", 


(2) 


where i”, is”, --- are the components of i, . 
Proor or (22): By repeated applications of Taylor’s theorem in several 
variables, together, in the last step, with some applications of the multinomial 
theorem, we have 


>= (¢) o(u(x)) = o(u(x- w)) 


r! 
s| 21 
(an + Z w',) 


En (E ws) (5 
y ll nets) (y), 


£~-TT.,” (NS 
S I] i,! 
s 
and the result follows on equating coefficients of w’. 
In Faa di Bruno’s formula both y and z are scalars. Riordan [11], who gives 
earlier references, points out that the generalization to the case where there are 





MULTIVARIATE SADDLEPOINTS & MULTINOMIAL x? 541 


several independent variables is purely a matter of notation. Here I have taken 
t; as a vector, for good measure. 


By taking ¢ = exp, and y as the scalar function ¥(x) = ) 2 (x,/r!)x*", we get 


a” Dy O(s)- 


By taking ¢ = log, and ¥(x) = >> (u,/r!)x", we get 


1 —1)*"(j - 1)! .\'* 
oe as EG) 


(Cf., Lukaes [9], for the case 1 = 1.) In both formulae the summation is over 
all z,’s for which 7, = 0, a. si, = r, and |s| = 2, and j means y i,. If the 
moments, u,, are replaced by », , then the condition |s| = 2 in (23) and (24) 
is to be replaced by |s| 2 1. To get the »,’s we replace the cumulants by tx, in 
(23), and put x, = 0 when |s| = 2. 

For example, when | = 2, we have 


(23) 


in=tre Hf 2Sr+ss 3, 
Hao — Sudo, Kn = oar — Spoomn , Kee = por — paooee — Quis , ete. 
Ko + 3x20, Mar = Ka + SkooKn , Moe = Kee + Kookee + 2xir , etc. 
= 1 Spsoute0 - 10u50 + 30u20 
= Keo + Ldxgoke9 + 10x30 ae 15x30 
usr — Spsounr — 10 ps2 — 1LOpsouen + 30ur0un 
ksi + Skok + LOksk20 + 1OksoKe1 + 15xg0«u 
Hae — Msonor — Susur — Gpe2p20 — 4psomi2 — Gusr + Surouoe + 24usouir 
Kao + Ksokon + Skarkin + Gk22K20 + Anson + Gxo1 + SazoKor + 12kzoxir 
Mss — Suto: — Opuooir — Spisteo — psoos — Ouamie + 18 pyo0mo2un 
+ 12yi, 
Mss = Kas + Skarkor + Oeooki1 + Skisk2o + KaoKos + Oeorki2 + Oeokoekn + 6xi1 


The other three pairs of bivariate formulae of order six can be written down by 
symmetry. For our application the formulae of order five, and other odd orders, 
are irrelevant. The above formulae for the moments and cumulants check one 
another. A further check is that when a cumulant of order r is expressed linearly 
in terms of moments about the mean, the sum of the coefficients is equal to 
r!@(z’) log (e* — x), whatever be the value of 1. Similarly, when a moment (of 
order r) about the mean is expressed linearly in terms of the cumulants, then 
the sum of the coefficients is equal to r! C(2’) exp (e* — 1 — zx). (These two 





542 I. J. GOOD 


assertions are readily proved by putting all the moments about the mean equal 
to 1 first, and second putting all the cumulants equal to 1.) Up to order ten we 
can therefore check against the case 1 = 1, given by Kendall [8]. 

Up to order six the formulae for the hiv ariate y,’s in terms of the bivariate 
cumulants are 


lw=0 if igr+ess 
=, iff 3Sr+338 4 
tkeo + 100*x3o 
tke: + 100° Ks0K2 
vie = tae + Al Ks0K12 + 6l'x2 
vas = tas + OxsoKos + Ol KoK:2 


We can write down vu, 15, vos , by symmetry. 
For the case 1 = 2, the sum in formula (18) can now be written 


r+s=m4 1 r+s=6 


~ 488 >- vr C(ai af) (a’Ka)* + --- 


(27) 1+ > kre @(a; a3) (a’K'a)® 
r,s 

The last term given explicitly contains terms of order f', since the v,,’s of order 

six contain terms of order ¢’, but no omitted term does so. It is possible that, for 

any finite value of t, we should sometimes get a more accurate result by using 

the terms shown explicitly here than by using the terms shown explicitly in [5], 

formula (6.10). The difference between the explicit parts of these formulae is 


r+s=6 


(28) Dd kre C(a; a3) (a’K~'a)*. 
a ra 
To conclude this section I should like to summarize some advantages of using 
a saddlepoint method proper. 

(i) It is more difficult to justify formula (14) for the more general method (in 
which the first-order cumulants of our artificial random variable do not all 
vanish ), because the modulus of the integrand in (9) is liable not to decrease 
rapidly enough when we move away from its maximum. 

(ii) It is only when k = 0 that the series (14) consists of terms of smaller and 
smaller order. For when k ~ 0 it can be shown that A,( kt! | C) is of order 
as large as f''"', 

(iii) When k = 0 the Hermite polynomials vanish for odd values of |r|, and also 
simplify for even values of |r}. 
Nevertheless it seems worthwhile to notice the existence of the formulae with 
k ~ 0, since 
(a) Saddlepoints do not always exist. An example is given in the next section 
in which there is at any rate no real saddlepoint. 
(b) Even when a saddlepoint exists it is often numerically laborious to com- 
pute it. 





MULTIVARIATE SADDLEPOINTS & MULTINOMIAL x? 543 


(c) The more general formulae are of some mathematical interest and enable 
one to see the saddlepoint method in a more general context. 


3. Chi-squared for the Equiprobable Multinomial Distribution. In [5] I gave 


a method of obtaining a saddlepoint approximation for the probability “density” 
of 


x = iN* >. (n, — N/t)’, 


or equivalently of S = >> n?, for a t-category equiprobable multinomial dis- 
tribution of sample size N, where the cell entries are mo , m, --* , Ne, . The pur- 
pose of the present section is to continue the discussion of this method. The 
calculation required the solution of two equations for a saddlepoint (p, p’) 
(equations (8.3) and (8.4) of [5].) In discussions between Mr. Peter John 
Taylor and myself, arising out of attempts to solve the equations on an electronic 
computer (Pegasus at the Admiralty), we discovered that these equations do 
not have a solution when x’ > ¢, and the saddlepoint method appears to break 
down. This is unfortunate since x’ > ¢ is much the more interesting case in most 
applications. On page 877 of [5] I erroneously supposed that there is always a 
solution. 

There is, however, a way round the difficulty. 

Let the probability that S = M (ie., that x’ = tMN~ — N) be denoted by 
p(M | N, t). Then 


(29) p(M\N,t) = e(a“%y") NIE (f1)', 


where 


L nin 
fi = fila, y) - > ll ’ 


n=0 n! 


and L = min (N, M’). (There is little inaccuracy in taking L smaller provided 
that P (max n; > L) is negligible. The probability can be estimated as in [5].) 

In [5] I took L.= o. By taking L finite it turns out that the saddlepoint 
equations, namely 


2 n2 yn 


L 2 
(30) i et 
0 nm) 


(31) 


always have a solution when x’ > t, except perhaps in the trivial case M = N’. 
(The solution is unique in virtue of [5], p. 874.) This statement is a special case 
of the following more general one. 

Suppose M + N’. Let non-negative integers be defined by the inequalities 
uw <M/t Ss (ut+1)’*,» < N/t S v +1. Then the simultaneous equations (30) 
and (31) have a solution if (i) uw > v, and also if (ii) uw = v and 


, N N 
29 ievcet ae oe at i ns 
(32) ; ; > (3 (> +1 Y) 





544 I. J. GOOD 


but they do not have a solution if (iii) u = v and 


™ we ee — 
(33) ae tan < ( r)(> +1 *. 


(For example, there is no finite solution if N < t, x’ < t — N.) This statement 
covers essentially all cases since it is impossible for » to be less than ». 
OUTLINE OF PROOF. For each p > 0, equation (30) has a positive solution for 
p’. Call it pr = pi(p). Similarly equation (31) has a solution pz = p2(p). The con- 
dition for the existence of a joint solution is that p; — p2 changes sign when p 
increases from 0 to ©. (p; and ps are continuous functions of p.) 
Equation (30) can be written in the form 


M 2 
M cet aie 
x + ae +e = 


2 M 2 M 
way = L* -— — 


(w+)? pp t 12 
wes eg | ear I a i sheers eee 
(u + 1)! Rint 


L! 

in which the terms on both sides are all non-negative. When p is very small it 
turns out that we can approximate the relationship between p and p; by retain- 
ing Only the last term on the left and the first term on the right (or the second 
one if M/t is an integer that is a perfect square). But when p > it turns out 
that we need retain only the first term on the left and the last one on the right 
(even if L = N, t = 1, provided that M + N’*). (If L were infinite there would 
not be a last term on the right.) We find 


1 
' L! my cs S 
een se 


and similarly 


M 2 

' a? 
RS le eel retro as # 
ee iy - = 


M/t ¥ (un + 1)’; 
N 


——- 7 
pa ~ (» + 1) ——— 
vt 1-2 


N/t#v+1; 


9 


p~Ap”* as pO, 





MULTIVARIATE SADDLEPOINTS & MULTINOMIAL x? 


M/t = (u + 1)’ (where A; is independent of p), 
p2~ Asp” * as p—0, 
N/t = v + 1 (where A: is independent of p). 


We now see easily that pi < ps when p—> 0, if M < N’. 
When p — 0, then pi > pif uw > v. Now 


. M_(NY ,x | 
(34) ra 7 (*) + t 


so if N/t = » +1, we have up = » + 1 > v. Therefore pi > ps if N/t is an integer. 
If N/t is not an integer, then p1 > ps if M/t = (u + 1)? (since uw = »). Finally 
if N/t#v+1, M/t ¥ (wu t+ 1)?, and » = v; then pi 2 po when p — 0 pro- 
vided that 


N 

ret 
t 
* aie ce nena 
a ie N’ 

lot L.- e Ot hve 


and this reduces to the asserted condition in virtue of the relation (34). 
We may observe that, if L = ~, equations (30) and (31), which are now 
equations (8.3) and (8.4) of [5], can be solved when p = 1, and give 


, 1 M , 1\' ' N 
ny=—h+ (Mai), p2(1) -* 


2 t 4 3 
and so 


pi(1) > pl) if x >t 
pi(1) < po(1) if x <t. 


We may therefore expect that, even if L is finite, the value of p satisfying equa- 
tions (30) and (31) usually exceeds 1 when x > tand is usually less than 1 
(if it exists) when x’ < t. Of course, when L = &, values of p exceeding 1 are 
not legitimate, and I suspect that f(x, y) cannot be continued analytically across 
the boundary |z| = 1. 

Mr. P. J. Taylor has kindly written a Pegasus Autocode program for the 
solution of equations (30) and (31). Using this program, pairs of values of p and 
p’ were obtained with N = t = 10, L = 11 (L = 10 would have been adequate) 
and M = 28(2)46. As evidence of the correctness of the program, I quote two 
pairs of results. For M = 28, we obtained p = 1.1552500, p’ = 0.5914205; and 
for M = 46, p = 1.2158011, p’ = 0.4119696. (The seventh places of decimals are 
unreliable.) The method of solution was to start with a trial value of p (either 
guessed or derived from the preceding value of /), then to solve equation (31) 
for p’, then use this value of p’ to solve equation (30) for p, and so on; the whole 
procedure being greatly speeded up by assuming the consecutive differences in 
the values of p to form a geometrical progression. This procedure can be seen to 





546 I. J. GOOD 


converge by drawing the graphs of pi(p) and p2(p). (The procedure would diverge 
if equations (30) and (31) were taken in the opposite order.) The solution of 
ach equation separately was obtained by a crude method of bisection, the time 
being halved by making use of some fairly readily estimated upper and lower 
bounds. (Each bisection gives one additional binary place.) 

Had our intention been to produce tables we would have taken L considerably 
larger than N, so that the results would be applicable for considerably larger 
values of N, M, and t, having the same ratios, or as trial values when the ratios 
were approximately the same. (This point will be exemplified below. ) 

We can now apply Theorem 6.2 of [5], but with the following slight modifica- 
tion. 

In the present problem the parity of M is the same as that of N, otherwise 
p(M, N, t) vanishes. Therefore condition (6.16) of [5] is not satisfied (see [5], 
pp. 873 and 874). There are here two equally important saddlepoints, namely at 
(p, p’) and at (—p, —p’). These make numerically equal contributions and the 
signs agree or disagree according as M and WN are of the same or of opposite 
parities. Therefore the formulae for c(M, N, t) and p( M,N, t) need to be doubled 
when M and N are of the same parity. This point was overlooked in [5], and 


formula (8.5) should read 
a (: + . + +) (t even) 
(35) P(x? =t|N =t) = dv 6 


\0 (t odd). 

In order to calculate formula (6.10) of [5] I used another Pegasus autocode 
program. The values of p and p’ corresponding to t = N = 10, L = ll, 
M = 28(2)46 were those obtained in the previous program. For M = 20, p and p’ 
are both approximately equal to 1 (cf. [5], p. 877), and this case gave a good 
approximate check on the program. 

Column (i) of the table gives all the possible values, a, of x’, when N = t = 10 
(and also the impossible values a = 38 and a = 46). Column (ii) gives the 
precise probabilities that x’ = a. This column was kindly calculated by Mr. P. J. 
Taylor, by using the formula 

N! 1 
Mil--- nm! iy 
summed over all n, , --- , nm: for which n, + --- +n. =N, ni+---+ni=M 
(I give these exact probabilities in full detail in case the reader wishes to test 
some other method of approximation.) The number in brackets, following each 
probability, is the number of partitions of X corresponding to that probability. 
The smallness of these numbers of partitions suggests that X is likely to be too 
small for our asymptotic approximations to be very accurate. 

Column (iii) gives the gamma-variate approximation (obtained from the 
tables of Pearson [10]), 


P(S = M)=)>> (M =a+10), 


1 a+l 403.5 
saan f re ae 


a—1 





MULTIVARIATE SADDLEPOINTS & MULTINOMIAL x? 


TABLE 
Accurate and Approximate Values of P(x’ = a) when N = t = 10 





(ii) (iii) (iv) (v) | (vi) 
ist termof | % 88€ correc- 


tion to allow 
(6.10) doubled. lor nent bate: 


| Two terms of 


2 = y 
P(x ¢) Gamma approx. (6.10) doubled. 


.000362880 

.016329600 
.114307200 | .130 
. 212284800 .197 
. 223851600 | .199 
. 193369680 .163 
.082555200 | 114 
.085730400 | .071 
031752000 | .0422 
015699600 (: | .0235 
010160640 ( 0125 
.007620480 .0066 
.002850120 .00322 
.001383480 00159 

28 000181440 .000739 

30 .000 635040 | .000354 

32 .000725760 | .000163 

34 000045360 .000074 

36 000060480 .0000334 

38 .000000000 .0000150 

40 .000001134 

42 000062370 

44 000025920 

46 .000000000 

48 000001080 

56 .000003240 

58 000000405 

72 000000090 

90 .000000001 

Total 1 .000000000 





in which the range of integration corresponds to the use of the continuity cor- 
rection apparently first published by Cochran [1], which is analogous to that used 
for 2 X 2 contingency tables. Column (iv) gives the values obtained from the 
leading term of (6.10), after doubling for the reason mentioned above. Column 
(v) gives the percentage corrections to be made to allow for the term of order 
1/t. These corrections are disappointingly large, and, when above 60%, do not 
improve the estimates. Column (vi) gives (6.10) to order 1/t. 

The gamma-variate approximation to the tail-area probability (with con- 
tinuity correction ) is never wrong by more than a factor of 2, in the range covered 
by the above table, when the tail-area probability exceeds yp, even though the 
expectation in each cell of our multinomial distribution is only 1. It would be 
interesting to know whether (6.10) of [5] would give better results for larger 





548 I. J. GOOD 


values of N and t, with N Ss t. The exact values of p(M, N, t) could be calcu- 
lated from (4.6) of [5], or from a recurrence relation derivable from it. The exact 
calculation of p(M, N, t) for all M Ss My, andall N S No, would require about 
4 MiN¢ loge t multiplications. 


REFERENCES 
W. G. Cocuran, ‘The x? correction for continuity,’’ Iowa State Col. J. Sci., Vol. 16 
(1942), pp. 421-436. 
H. Cram&r, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1946. 
H. E. Dantens, ‘‘Saddlepoint approximations in statistics,’’ Ann. Math. Stat., Vol. 
25 (1954), pp. 631-650. 
A. Erptéiy1, W. Maenus, F. OBERHETTINGER, AND F.G. Tricot, Staff of the Bateman 
manuscript project, Higher Transcendental Functions, Vol. I1, McGraw-Hill, 
New York, Toronto, London, 1953. 
I. J. Goon, ‘‘Saddlepoint methods for the multinomial distribution,’’ Ann. Math. 
Stat., Vol. 28 (1957), pp. 861-881. 
HaRoLp JEFFREYS AND BERTHA SWIRLES JEFFREYS, Methods of Mathematical Physics, 
University Press, Cambridge, 1946. 
M. G. KENDALL, “‘The derivation of multivariate sampling formulae from univariate 
formulae by symbolic operation,’’ Ann. Eugenics, London, Vol. 10 (1941), pp. 
392-402. 
[8] Maurice G. Kenpauu, The Advanced Theory of Statistics, Vol. I, Charles Griffin, Lon- 
don, 1945. 
{9] Eveene Luxacs, “Applications of Fad di Bruno’s formula in mathematical statistics,’’ 
Amer. Math. Monthly, Vol. 62, (1955), pp. 340-348. 

[10] Karu Pearson, Tables of the Incomplete T-Function, Biometrika Office, London, 1922, 
1934. 

[11] Joun Riorpan, ‘Derivatives of composite functions,’ Bull. Amer. Math. Soc., Vol. 
52 (1946), pp. 664-667. 

[12] Joun Rrorpan, An Introduction to Combinatorial Analysis, John Wiley, New York; 
Chapman and Hall, London, 1958. 








A GENERALIZATION OF WALD’S IDENTITY WITH APPLICATIONS 
TO RANDOM WALKS 


By H. D. MILLER 
Statistical Laboratory, University of Cambridge 


0. Summary. Let S,, = Xi + --- + X,,, where the X; are independent 
random variables with common m.g.f. ¢(¢) which is assumed to exist in a real 
interval containing t = 0. Let the random variable n be defined as the smallest 
integer m for which either S,, 2 aor S, S —8 (a > 0,8 > 0). Thus n can be 
regarded as the time to absorption for the random walk S,, with absorbing bar- 
riers at a and —8. Let S = S, and let 


F,(z) = P(—8 < 8&& <a for k=1,2,---m—1 and S, $2). 


The main result of the paper is the identity 


(0.1) E(e*z") = 1 + [ep(t) — 1]F(z, 2), 


where 


F(z,t) = >> 2” f: e* dF,,(2). 
m=) a 

Wald’s identity follows formally from (0.1) by setting z = [¢(t)]”’. Regions of 

validity of (0.1) and of Wald’s identity are discussed, and it is shown that the 

latter holds for a larger range of values of ¢ than is usually supposed. 

In Section 5 there are three examples. In the first we consider the case where 
there is a single absorbing barrier and where the X; are discrete and bounded. 
This is a gambler’s ruin problem, and we obtain an expression for the prob- 
ability of ruin. In the second we use the classical random walk to illustrate the 
region of validity of (0.1). In the third we obtain the Laplace transform of the 
distribution of the time to absorption in a random walk in which steps of +1 and 
—1 occur at random in continuous time. 


1. Introduction. Let X,, X:2,--- be a sequence of independent random 
variables with common distribution function A(z) and moment generating 
function 


(1.1) g(t) = f e* dA(z). 


Let So = O and let S,, denote the cumulative sum 
Sn = Xi + X2+ °° + Xe, m= 1 


We ignore the trivial case where the X; are constant with probability 1. The 

X,; can be regarded as the successive steps of a particle starting at the origin 

and S,, then represents the distance of the particle from the origin at the mth 
Received May 25, 1960; revised November 4, 1960. 


549 





550 H. D. MILLER 


step. Suppose that there are two absorbing barriers, one at a and the other at 
—B(a > 0, 8 > 0), and that the regions S,, 2 a and S, S —8 are absorbing 
regions. Let n be the integral-valued random variable denoting the step at which 
absorption occurs. Thus n 2 1 is defined by 


—B < Sm <a, =12,---n—1, 


S,S —-B or S, 2a. 


For convenience, write S = S, . Then the fundamental identity of Wald [12] is 
(1.2) E{{o(t)}~"e] = 1. 


If a and @ are both finite and ¢ is complex, then Wald showed (1.2) to be valid 
for all values of ¢t for which |¢(t)| 2 1. 

In the single barrier case, say where a < ©, 8 = ~, (1.2) has been used to 
determine P(n < «). This is the probability of ultimate ruin in the gambler’s 
ruin problem where X ; is the gambler’s loss at the jth play and E(X;) < 0. His 
initial capital is a. (Cf., Bahadur [1], Bartlett [2], p. 89, Wald [12].) Another 
application of Wald’s identity is the determination of the characteristic function 
of n (Wald [12]). Some authors, in particular Bellman [3], Blackwell and 
Girshick [4], Ruben [9], and Tweedie [11] have generalized (1.2) in the direction 
of widening the class of processes for which such an identity is valid. Doob 
({5], pp. 350-352) has shown that (1.2) may be derived from the theory of 
Martingales. In statistics the most important application of (1.2) has been in 
sequential analysis. However, the present paper has been written from the point 
of view of random walks rather than that of sequential analysis. 

Generally, Wald’s identity seems to have the character of an isolated result, 
unconnected with the Chapman-Kolmogorov relations which hold for a Markov 
process such as the random walk. In the present paper we stress the Chapman- 
Kolmogorov approach and show that (1.2) may be derived thereby. We restrict 
our attention to random variables whose distribution admits a moment generat- 
ing function, since it is in this case that we are able to discuss regions of validity 
of (0.1) and (1.2). However, (0.1) is true even if ¢(t) exists only on the imagin- 
ary axis. 


2. Notation and Definitions. We adopt the convention that a single absorbing 
barrier at, say, a is denoted by a < ~, 8 = «. We define 


F,(z) = P(-B < & <a for k=1,2,---m—1 and S, Sz) 


(2.1) Fo(x) l, 224 
= 0), zx <@. 


F,,(x) is a distribution function in an extended sense simce F,,(2) < 1 in 
general, owing to the fact that probability has been “leaking” through the bar- 
riers (cf., Bartlett [2], p. 16). The relevant Chapman-Kolmogorov relations are 
the recurrence relations satisfied by the F,,.(2), namely 





GENERALIZATION OF WALD’S IDENTITY 


Pita) * e Me <y) Ad, 


We define the double generating function 


(2.3) F(z,t) => | e* dF,,(x), (O0<aS~,0<BS ~), 
m=) — 8 


where z and ¢ are complex variables whose respective regions will be stated as 
the need arises. We define G,,(x) to be the distribution function for the unre- 
stricted sum S,, , i.e., 


(2.4) G,.(z) = P(S, S 2). 


We shall assume in the sequel, except in Section 4(i), that the integral (1.1) 
defining ¢(¢) is convergent in a real interval surrounding t = 0, say b < t < a, 
where —~ £b<0< aS ~. This will be true, for example, if A’(z) exists 
and decreases exponentially as x — +. It follows that ¢(¢) is an analytic 
function of ¢ in the strip b < Re (t) < a, and for real t, ¢”(¢) > 0. Thus ¢(t) can 
have at most one minimum in b < ¢t < a, and we assume that this minimum 
exists and that it occurs at the point t& . Thus é& is the unique real root of ¢’(t) = 0 
in b < t < a. (The point & does not necessarily exist for an analytic moment 
generating function, for consider the probability generating function 


M(z) = A[(1 — cz)*? + (3)cz] + B[(1 — cz*)*? + (3)cz™), 


where 0 < c < land A and B are chosen so that M(1) = 1. M(z) has a Laurent 
expansion in the annulus ¢ < |z| < c’, and A and B may be chosen so that 
M’(z) has the same sign throughout the real interval ¢ < z < ¢”.) 

Let uw = E(X;) = ¢'(0). Then & z 0 according as u = 0, and if » * 0 then 
0 < o(h&) <1. 


3. Main Results. 


Lemma 3.1. Let u denote the real part of t. Then the series (2.4) defining F(z, t) 
is convergent in the region 
(i) 
lz] <[o(o)J', tb Su< a, 
(3.1) by pie Ys 
lz| < [@(u)], b<u<bh, 


and correspondingly 
lel <[o(@)J", —-2% <usb, 
(3.2) a 
lz| < [o(u)], tb <u<a, 
(ii) 
(3.3) lz] < [6(to)]", all finite t, 





H. D. MILLER 


PROOF. 
(i) Suppose that a < ©,8 = © and u = &. Then 


(3.4) F(z,t) = >> n | e* dF,,(x). 


m=as() 
Now we have 


e* dF,,(x)| e* dF, (x) </ e* dG,,(x) 
© “27 =o 


2 


x 
a(u— ty tor ' 
Se | e” dG,,(2x) 
—2 


}m 


gf?" [( ty) 


Thus if uw = t, the series (3.4) is convergent for |z| < [¢(to)]™. 


If u< lo, then 
2” | a” dF, (x) < 2 | é * AF, (x) 


oa 
Z “7 e* dG,,(x) 
x 


2!" [o(u)]”, 

. : : , : 1 

and in this case the series (3.4) converges for |z| < [¢(u)]-. 
A similar argument gives the corresponding result for the case where a 


and 8B < ~. 
(ii) If @ and @ are both finite, then by a similar argument to that in (i) we 


= 


have 


| e dF,,(x) | se“ [6(t,)]” foru = b 
| Wg 


oa 


| e dF,.(x) | se” [6(ty)}” for u S by. 
—8 
Thus in this case the series (2.3) converges for all finite ¢ and |z| < [¢(t)]”. 
This completes the proof. 

CONVENTION. In the sequel we adopt the convention that in cases where 
P(n < @), the probability of absorption, is less than unity, the expectation 
symbol F is taken to mean p.f., where p. = P(n < ~) and E, denotes ex- 
pectation conditional on absorption. 

THEOREM 3.1. (A Generalization of Wald’s identity). Let S and n denote the 


random variables defined in Section 1. Then we have the identity 


(3.5) E(ez") = 1 + [eo(t) — 1)F(z, 2), 





GENERALIZATION OF WALD’S IDENTITY 553 


and provided we adopt the above convention, (3.5) holds for all t for which $(t) 
exists and for \z\| < [(to)]™'. 


Proor. In virtue of the definition of F,,(2), (2.1), we have 
o —s oo’ 
E(ez") = >> 2” ( + / ) em dF,,(x) 
m= a) a 


o 7 ([ - [/)etarate) 


x2 


2 f. e dF,(x) — (F(z, t) — 1), 


m=! 


provided that the series on the right hand side converges; it converges absolutely 
for |z| < [o(u)]", b < u < a, where again u = Re(t), since 


‘| e” dF,,(x) < | e* dGn(x) = [o(u)]”. 


In virtue of the product theorem for the two-sided Laplace-Stieltjes transform 
(Widder [14], Ch. VI, Theorem 16a), we obtain from (2.2), on inverting the 
order of integration, 


! e” dF,,(x) [ evaay) | e* dF,-1(2), 
30 > 8 


a (b<u<a) 
$(t) L, e* dF p-(2). 


=> 2"o(t) | e dFma(z) — (F(z, 0) = 1 
m=1 — 8 


= 1+ [z¢(t) — 1] F(z, t), 


which is the identity (3.5). 
As far as the region of validity is concerned, we note that ifa < »,8 < « 
then F(z, t) is an entire function of ¢ and a regular function of z for 


’ 


lz] < [p(t]. 
The right hand side of (3.5) may thus be taken to define the left hand side for 
lz| < [oi to)’ and for those values of ¢ for which ¢(t) exists. 
In the case where a < ~ and 8 = ©, we have 


E(e*z") = of e aFa(2), 
m=1 S 


and an argument similar to that used in the proof of Lemma 3.1 shows that the 
series on the right is convergent for 





ao 


554 H. D. MILLER 


(3.6) (u = Re(t)) 


lel <[o(u)]’, thsu< oa, 
< 


z| <[o(o)J", bust. 


Thus for fixed ¢ the left hand side of (3.5) is a regular function of z in the region 
given by (3.6), while the z-region of regularity of F(z, t) is given by Lemma 
3.1. Thus, as functions of z, each side of (3.5) may be regarded as the analytic 
continuation of the other beyond the common region of regularity, and the 
identity therefore holds for |z| < [(t)]"' Each side is clearly an analytic func- 
tion of t for b < Re(t) < a. 

A similar argument deals with the case where a = © and 8B < &, and the 
theorem is therefore proved. 

For the special case a = 0,8 = ~, the identity (3.5) was proved by Spitzer 
({9a], Theorem 3.1) who quotes the result as being due to G. Baxter. 

The identity (3.5) is a generalization of Wald’s identity and the latter follows 
formally from (3.5) by putting z = [¢(t)|"’. In Theorem 3.2 we show the im- 
portance of the point f , the minimum point of ¢(¢), in determining the region 
of validity of Wald’s identity. 

THEOREM 3.2. (Wald’s identity) If we adopt the convention of the previous 
theorem regarding the expectation symbol E, then Wald’s identity 


(3.7) E(e[¢(t)|-") = 1, 
holds provided that \(t)| > (to) and in addition t satisfies 
(i) a> Re(t)>th tf a< oa, B= ow; 


(3.8) (ii) b< Re(t)<t tf a<o, B < @; 
(iii) b< Re(t)<a tf a<o, B< @., 


Proor. The result follows from the previous theorem by setting z = [¢(t)]~ in 
(3.5), and noting that F(z, t) is finite if |z| < [¢(t)]” and if ¢ satisfies the con- 
ditions (3.8) (Lemma 3.1). In general, we have, for b < Re(t) < a and 


\o(t) | > $(to), 
E(e“[¢(t)") =1+ lim. [eo(t) — 1]F(z, t). 


z2>[¢(t)]~ 
It should be noted that the regions of validity of Wald’s identity are sufficient 
for applications such as the determination of the probability of absorption and 
the characteristic function of n. For example, ifa < ©,8 = ~ and E(X;) > 0, 
then & < 0, and using the root t = 4(2) of z(t) = 1 which satisfies 4(z) > t 
for z real, we obtain the approximate relation (Wald [12]) E{exp at;(z)z"} = 1 
or E(z") = exp{—at,(z)} approximately. 


4. Further notes and generalizations. 


(i) If ¢(t) is not defined except on the imaginary axis, and if a and 8 are 
both finite, then the identity (3.5) is still valid although the region of validity 





GENERALIZATION OF WALD’S IDENTITY 555 


is not the same since f has no meaning. Stein [10] showed that as a consequence 
of the “leakage” of probability, F,,.(a@) — F»(—8) tends to zero exponentially 
as m — ©. Thus for purely imaginary t¢, the series (2.3), regarded as a power 
series in z, has radius of convergence 1 + c, where c = c(a, 8) > O and uni- 
formly in ¢t. Thus in (3.5) we may set z = [¢(t)]”’ provided that 


i@(t)| > 1/(11 + ¢). 


(ii) There is a connection between the identity (3.5) and the Wiener-Hopf 
integral equation. In the following formal argument we suppose that the random 
walk starts at h > O and that there is an absorbing barrier at the origin. We 
assume that a(x) = A’(x) exists and it follows that f(z) = Fn(x) also exists. 
Let 


oo 


f(z,z) = 20 2¥.(2), 


m=0 


where fo(z) = 5(x — h), 6(x) being the Dirac delta function. Then the recur- 
rence relation (2.2) becomes 


f.A(z) = [ a(x — y)fm—s(y) dy 
0 


and thus 


f(z,z) — Ha —h) = z[ a(x — y)f(z, y) dy, 
0 


—i(x —h) = | {za(x — y) — &(x — y) }f(z, y) dy. 
0 

This is an integral equation of the Wiener-Hopf type with the difference kernel 
{za(x — y) — 6(x — y)} and holds for z > 0. The method of solution is to 
assume that for z < 0, the left hand side is defined by some function g(z, x) 
which vanishes for x > 0. Both sides of the equation are then transformed by 
multiplying by e” and integrating with respect to x from — © to , thus ob- 
taining 


| e g(z,x) dx —e” = {zp(t) — 1} [ e” f(z, y) dy. 
hes : 


This is the identity (3.5) (in the generalized form (4.1) below) and it is seen 

that the transform of the unknown function g(z, x) is identified with E(e‘*z”). 
(iii) If the random walk starts at the point h, then (3.5) becomes 

(4.1) E(e‘Sz") = e” + {zo(t) — 1) F(z, t). 


(iv) It was shown by Wald [13] that (3.7) may be differentiated any number 
of times under the expectation sign and then we may set t = 0. This procedure 





556 H. D. MILLER 


can be used to obtain moment relations such as E(S) = E(X)E(n). The dif- 
ferentiation property is a simple consequence of the fact that F(z, t) is a regular 
function of z at the point z = 1 when (a) a < ~, 8 < & whether or not ¢(t) 
is regular at ¢ = 0, (b) a < ©, 8 = ~, E(X) > O and ¢(t) regular and (c) 
a= 0,8 < «, E(X) < 0 and ¢(t) again regular. We cannot use complex 
variable arguments to obtain moment relations in cases (b) and (c) when 
¢(t) is not regular. 

(v) Suppose that the steps of a random walk occur in continuous time and 
are governed by several independent distributions A;(2), Ao(x), --+ Awn(x) 
where steps with distribution A;,(2) occur in a Poisson process with mean rate 
r, per unit time (k = 1, 2, --- N). Let S(r) be the displacement at time 7 and 
let the function F,(x) correspond to F,,,(2) in the discrete-time case. Thus 


F(z) = P{-—B < S(r’) <a@ for O< 7r'’ <r and S(r) S 2}. 


Let 


cs) 


J(v,6) = [ e "4 f e* dFAx)) dr 
\ —B8 


“0 


and J(v, 6) corresponds to F(z, t) of (2.3). We now have N moment generating 
functions 


o;.(6) = | e dA,(z), 


Let T be the time at which absorption takes place (corresponding to n in the 
discrete-time case), and let S = S(7’). Then the continuous-time analogue of 
(3.5) is 


(4.2) E(e**") = 1 + [>> refoe(0) — 1} — vo] J(, 8) 
k 
and the manner of derivation is similar. If we now set 
v= Do ndd(6) — li, 
k 


then we obtain the corresponding form of Wald’s identity 


E(exp [6S — T)>> rifox(@) — 1)]) = 1. 
kK 


Dvoretzky, Kiefer and Wolfowitz [5a] have shown that Wald’s identity holds 
for processes in continuous time and have given applications to sequential tests 
for such processes. In the third example of Section 5, we show how (4.2) may 
be applied to a simple random walk in continuous time. 


5. Three examples. 

(i) We consider a discrete time random walk starting at the origin. The 
steps X;(j = 1, 2,--- ) are discrete and bounded. There is a single absorbing 
barrier at a, (a > 0). This is a gambler’s ruin problem, in which a is the gambler’s 





GENERALIZATION OF WALD’S IDENTITY 557 


initial capital and X; is the adversary’s gain at the jth play. Ruin corresponds 
to absorption. It is well known that ruin is certain if E(X;) 2 0. We shall use 
the identity (3.5) to obtain an expression for the probability of ruin in the case 
where E(X;) < 0. We now take a to be a positive integer. 

The probability generating function for the X; is given by 


a 


M(w) = = Dew" (O<a< x~,0<b< @). 
k 


We assume that 
(5.1) g.c.d.(k — k’) = 1, 
where k and k’ run through all integers which satisfy p, ~ 0 and pm: # 0. The 
identity (3.5) now assumes the form 
(5.2) E(w*z") = 1 + {zM(w) — 1}H(z, w), 
where 
(5.3) H(z,w) = > 2” > py” w’ 
m= jum 


and 


ps” = P(Si <a for k=1,2,---m—1 and S» = j) 


9-2 
In this case, wo > 0 is defined by M’(wo) = 0, and wo 3 1 according as E(X) = 
The series (5.3) converges for 
1 j 
iz| < [M(wo)], |w| = wo, 
z| < [M(\w\)], \jw| < Wo. 
Since S can only take the values a, a + 1, --- a + a — 1, the left hand side 
of (5.2) can be written as 
a—l 


(5.4) E(w*z”) = >> w*™*R,(z), 


k=0 


where R,(z) = P(S = a+ k)E(z"|S = a+ k). If w(z) isa root of the equa- 
tion 


(5.5) zM(w) = 1 


then from (5.2) and (5.4) we obtain 


a—l 
(5.6) >. [w(z)]*"*Ri(z) = 
k=0 


There will in general be a + b roots of (5.5) and P. Whittle has indicated 
that, by a modification of an argument used by Lindley [8], we may show that 
for z < [M(wo)]”', b of these roots lie inside the circle |w| = wo and a of them 
lie outside. For complex z and |z| < [17(wo)] "we denote the roots which, when 





558 H. D. MILLER 


z is real lie outside the circle |w| = wo, by wi(z), we(z), --- wa(z). For real z 
we assume that these roots are distinct and satisfy 

wi(z) S |w,(z)|, k = 2,3, ---a. 
It follows from the assumption (5.1) that the above inequality is strict in a 
neighbourhood of z = 1. 

Thus we obtain a equations of the type (5.6), one from each of the roots 
w,(z) (k = 1, 2,---+ a), and these may be solved for the R,(z). The solution 
may be expressed in the form of a polynomial in w which takes the value 1 at 
the points w = w,(z) (k = 1, 2,--- a). Hence 


> =) 


k=vU 


a—1 a 

> w**Ri(z) = w" y [w.(z)]* II [{w — w;(z)}/{we(z) — w;(z)}). 
= J k 

We put w = 1 and z = 1 and write w,(1) = u,, thus obtaining 


a—l a 
(5.7) Pin < #) a >> R,(1) = > wi Tia — w;)/(w — w;)}. 
k=l 


k=0 j yak 
If E(X) > 0, then w, = 1 and hence P(n < ~) = 1. This is well known. If 
E(X) < 0 then w, > 1, and (5.7) is an explicit expression for the probability 
of ruin or absorption. Also, as a — *, we have the asymptotic relation 


j=2 


P(n < ©) = wi" fe {(w; — 1)/(w; — w,)} | {1 + O(8)], 


where 6 satisfies 0 < 6 < 1. The usual approximation for P(n < @) is 
P(n < ©) = wi’, 


approximately. See, for example, Wald [12] and Bartlett ((2], p. 20). 
(ii) In this example we consider the classical random walk for which 


P(X = 1) = p, P(X = —-1) =q=1-p, 


and we use the explicit results known for this case to illustrate the regions of 
convergence discussed in Lemma 3.1. It will be seen that in the single barrier 
case the regions given by Lemma 3.1 are sharper than in the two barrier case. 
Feller ((6] pp. 318-323) discusses this random walk in detail. Again, a and 8 
are positive integers. 

Using the notation of the previous example we have M(w) = pw + qu. 
The roots of zM(w) = 1 are 


wi2(z) = {1 + (1 — 4pgqe’)*}(2pz) 


Ifa < © and 8 = &, then H(z, w) (5.3) is given by 
H(z, w) = {w*|w(z)]>* — 1}/[pe{1 — w(z)w }{w — we(z)}], 


the method of derivation being the same as in the previous example. The factor 
w — w,(z) divides both the numerator and denominator of H(z, w), and there- 





GENERALIZATION OF WALD’S IDENTITY 559 


fore w = w,(z) is not a singularity of H(z, w). On the other hand w = w,(z) is 
a singularity of H(z, w). Now w = w,(z) corresponds to z = [M(w)]~ in the 
region |w| > wo, while w = w.(z) corresponds to z = [M(w)]” in the region 
lw| < wo, where w = (q/p)’ in this case. Thus in the region \w| < wo, the 
series for H(z, w) is convergent for |z| < |M(w){~', while in the region |w| > wo, 
the series is convergent for |z| < [M(wo)]" = (pq). Lemma 3.1 gives these 
z-regions as |z| < [M({w\)]"' and |z| < [M(wo)]” respectively. 

If we consider the two barrier case, then from Feller’s results it is not difficult 
to show that the z-singularity of H(z, w) nearest the origin is 

z = [2(pq)* cos {x/(@ + 8)\J* > [M(wo)]”, 
which means that the series for H(z, w) actually converges in a region larger 
than that given by Lemma 3.1. 

(iii) As an application of the identity (4.2) we consider a random walk in 
continuous time, in which steps of +1 occur in a Poisson process with mean 
rate r; , and steps of —1 occur in a Poisson process with mean rate r, . The bar- 
riers are again at a and —8 (a, 8 positive integers). The use of (4.2) in this 
case is made simple by the fact that the walk will terminate exactly on a barrier. 
In the notation of Section 4(v), we have¢,(@) = e” and ¢.(@) = e °. If we write 
w = e*, then (4.2) takes the form 


(5.8) E(w*e") = 1+ {n(w — 1) + n(w — 1) — of K(v, w), 
where K(v, w) = J(v, log w). Let w,(v) and we(v) be the roots of 
rn(w—1)+n(w'—1) =». 
Then 
wi2(v) = Arn tr —vt{(n+nr—v)’ — 4ryre}'}. 


Let m = P(S = a), m = P(S = —8) and let £,, EF, denote expectations 
conditional on S = a, S = —8 respectively. Then the left hand side of (5.8) 
may be written w*p,E,(e°") + w°pE2(e*”) and on setting w = w,(v), (i = 
1, 2), in (5.8), we obtain two equations 

w.(v)}“pEi(e°") + {w;(v)}°p.Be(e*”) = 1, ¢= 1,2, 
whence, if we write w, = w,(v) and w. = w.(v), 


8 


i-# 8 
\/(we’ wl — wrw, ) 


, »T —B _ 
pnrili(e "“) = (We — Wy, 


and 


»T 


ded 8 8 
poE.(e"") = (we — wi)/(wi wz — wiwe’). 


The above expressions are Laplace transforms of the two conditional distribu- 
tions of 7’, the time to absorption. For further details we refer to the paper of 
Heathcote and Moyal [7], in which the above result is obtained using difference 
equations and in which this random walk is discussed in detail. 





560 H. D. MILLER 


6. Acknowledgments. I should like thank Dr. P. Whittle for many helpful 
suggestions. Also, I should like to acknowledge a grant from the South African 
Council for Scientific and Industrial Research. 

REFERENCES 

1] R. R. Banapur, “A note on the fundamental identity of sequential analysis,’’ Ann. 
Math. Stat., Vol. 29 (1958), pp. 534-543. 

| M. 8S. Bartiert, Stochastic Processes, Cambridge University Press, London, 1955. 

] RicHarp BeiiMan, “On a generalization of the fundamental identity of Wald,”’ 
Proc. Camb. Phil. Soc., Vol. 53 (1957), pp. 257-259. 

4] D. BLACKWELL AND M. A. Grrsuick, ‘“‘On functions of sequences of independent chance 
vectors with applications to the problem of the ‘random walk’ in k dimensions,”’ 
Ann. Math. Stat., Vol. 17 (1946), pp. 310-317. 

5] J. L. Doos, Stochastic Processes, John Wiley and Sons, London, 1953. 

[5a] A. Dvorerzxy, J. Kimrer anp J. Wo.irowitz, ‘Sequential decision problems for 
processes with continuous time parameter. Testing hypotheses,’’ Ann. Math. 
Stat., Vol. 24 (1953), pp. 254-264. 

6] Wrii1aM FEe.uer, An Introduction to Probability Theory and its Applications, 2nd ed.., 
John Wiley and Sons, London, 1957. 


2 
3 


7| C. R. Heatucotre anp J. E. Moyat, ‘‘The random walk (in continuous time) and its 
application to the theory of queues,” Biometrika, Vol. 46 (1959), pp. 400-411. 
8] D. V. Lindley, ‘‘The theory of queues with a single server,’’ Proc. Camb. Phil. Soc., 
Vol. 48 (1952), pp. 277-289. 
9] Haro_tp Rusen, “A theorem on the cumulative product of independent random vari- 
ables,’’ Proc. Camb. Phil. Soc., Vol. 55 (1959), pp. 332-337. 
(9a] FRANK Spritzer, ‘‘A Tauberian theorem and its probability interpretation,’ Trans. 
Amer. Math. Soc., Vol. 94 (1960), pp. 150-169. 
{10} CHaRLEs Stern, ‘A note on cumulative sums,’’ Ann. Math. Stat., Vol. 17 (1946), pp. 
498-499. 
{11] M. C. K. Tweepre, ‘‘Generalizations of Wald’s fundamental identity of sequential 
analysis to Markov chains,’’ Proc. Camb. Phil. Soc., Vol. 56 (1960), pp. 205-214. 
[12] ABRAHAM Wa tp, Sequential Analysis, John Wiley and Sons, London, 1947. 
[13] ABraHAM Wa_bp, “‘Differentiation under the expectation sign in the fundamental iden- 
tity of sequential analysis,’ Ann. Math. Stat., Vol. 17 (1946), pp. 493-497. 
[14] D. V. WirppEr, The Laplace Transform, Princeton University Press, London, 1946. 








A CHARACTERIZATION OF THE WEAK CONVERGENCE OF 
MEASURES 


By Ropert BAarToszyNsk1! 


University of California, Berkeley? 


0. Summary. In this paper we shall investigate the so-called weak conver- 
gence of measures. Although the origin of the concept of the weak convergence 
of measures is a probabilistic one, the concept itself is purely measure-theoretical, 
and should be, therefore, treated by measure-theoretical methods. In Probability 
Theory the notion of the weak convergence of measures first appeared in Cen- 
tral Limit Problem. Its full importance, however, has been recognized only re- 
cently. It is now known as Donsker’s Invariance Principle. 

In this paper we shall follow Prohorov’s approach, as presented in [1]. The 
list of all necessary definitions and results is given in the Introduction. 

We shall give some conditions for the weak convergence of measures in 
separable and complete metric spaces, which are expressed in terms of conver- 
gence of measures generated in finite dimensional Euclidean spaces. The last 
convergence can be treated by standard mathematical tools, like the Theory 
of Fourier Transformations. It should be noted that our theorems concerning 
the convergence of measures in separable complete metric spaces remain valid 
if we omit the assumption of completeness. The proofs will remain essentially 
unchanged; only instead of dealing with compact sets, we should deal with to- 
tally bounded closed sets. 

The theorems given in Section 4 are of interest for the Theory of Stochastic 
Processes, since they give the conditions for the weak convergence of measures in 
the functional spaces D{0, 1] and C|0, 1], and to a large class of stochastic proc- 
esses there correspond measures generated in space D[0, 1] or C0, 1], and these 
measures are usually given in terms of y‘''"'", i.e. in terms of finite dimensional 
distribution functions of the process. 


1. Introduction. Let R be a complete separable metric space with the metric 
p. Denote by M(R) the space of all finite measures defined on the Borel o-fielP 
of subsets of R. A sequence yu, of elements of M(R) will be called weakly con- 
vergent to u» ¢ M(R) if for every bounded and continuous function f(z) on R 


n>2 


(1.1) lim [ $m (dz) = [ sn (dz). 


We shall denote weak convergence by =. The following Theorems A—F can be 
found in {1}: 


Received February 16, 1959; revised August 12, 1960. 
1 With the partial support of the U. 8. National Science Foundation under Grant G-4210. 


2 Present address: Mathematical Institute of the Polish Academy of Sciences, Warsaw, 
Poland. 


561 





562 ROBERT BARTOSZYNSKI 


TuHeoreM A. Let wu, ¢ M(R), n = 0, 1, 2,--- . Then un => po tf, and only if, 
limo Mn(R) = wo( R) and lim supnw un( fF) S pwo(F) for every closed set F C R. 

Let uw, we € M(R). Denote by «2 (resp. e) the greatest lower bound of 
those «, that for every closed set F C R we have w;(F) S weo( F*) + e€ (resp. 
po(F’) S w(F*) + €) where F* denotes the e-neighborhood of the closed set F. 
Let 


(1.2) L(y , ue) = Max (€.2, €2,1). 


The following theorem holds: 

THEOREM B. The function L, defined by (1.2), is a metric in the space M(R), 
and the conditions up, => uo and L(un, wo) 0 are equivalent. Moreover, M(R) 
with the metric L is a complete separable space. 

A condition for compactness of subsets of M(R) is given by the following 
theorem: 

THEOREM C. The set B C M(R) is compact if and only if supyes u(R) < 
and for every « > 0 there exists a compact set K, C R such that supyen u( Ki) < . 

Let R* be a complete separable metric space and let » ¢ M(R). If f is a con- 
tinuous function mapping R into R*, then, the condition u’(A) = uff "(A)} for 
the u-measurable f_'(A) defines the measure uy’ ¢ M(R*). The following theorem 
holds: 

TueoreM D. The condition yp, = wo holds if and only if for every real u-almost 
everywhere continuous function f on R we have uw, => pb. 

ReMARK 1. In the definition of metric L, it is sufficient to take the greatest 
lower bound with respect to compact sets only. In fact, let the inequality u,(K) S 
uo( K*) + ¢€ hold for all compact sets K C R and let F C R be an arbitrary 
closed set. Take a sequence {K,,} of compact sets, such that K, C Kyry1,n = 
1, 2,---, and w,[(U%_, K,)°] = 0 (see, for example, [2]). Then, for every n we 
can write 


mF K,)]) S milFAK,)') +¢ S m(F*) +6, 


and on the left hand side we can pass to the limit with n — ©, obtaining 


m(F) = w(FNUK,) S w(F') +. 


n=l 


ReMARK 2. An analogous distance of measures has been defined by Lévy 
when the space R is one-dimensional Euclidean space. He defined the distance 
between measures uw, and pe as 

L*( Mi , Me) 
(1.3) ) 
= inf {e; for every x: Fi\(x — e) —e S Fi(x) S Fi(x +e) + 4 


where F;(x) and F2(2z) are the distribution functions of the measures », and 
ue , respectively. 


In this paper A‘ wil! denote the complement of A 





WEAK CONVERGENCE OF MEASURES 563 


Generally, if Fi(a,,--- , 2m) and F2(2,,--+ , 2m) are the distribution func- 
tions of measures »; and we in m-dimensional Euclidean space R,, , then we can 
define the Lévy distance L*(F,, F:) as 

inf {«; for every %,-°*: , 2m: Fi(a, — 6 °*+,t%m — €) — € 
(1.4) 
S F(a, +++, Xm) S Fi(ay + €, +++, tm te) + eh. 


The convergence L*(F,, Fo) —0 is equivalent to the condition that 


lim F (21, °-°* ,2%m) = Fo(am, °°: , 2m) 
ne 
at every continuity point of the function Fo(x,,--- , 2»), and therefore, it is 
also equivalent to the weak convergence yu, = uo of the corresponding measures 
(see, for example, [3]). 
In this paper, whenever the m-dimensional Euclidean space R,, is considered 
it will be assumed that the metric in this space is defined as 


e( im, °°: » Buh (Hts °°* » Ym} ) ” max \te — Yel - 
lsksm 


By C\0, 1] we shall denote the space of all real continuous functions f(t) on 
(0, 1] with the uniform metric 


c(g, h) = sup |g(t) — A(t)|. 
Ost 


Denote by D(0, 1] the space of all real functions f(t) on [0, 1] satisfying the 
following conditions: 

(a) the limits f(t + 0) and f(t — 0) exist at every point ¢ ¢ (0, 1) and the 
limits f(0+) and f(1 — 0) exist at the points ¢ = 0 and ¢ = 1, respectively. 

(b) at every point ¢ ¢ [0, 1] one of the two equalities f(t) = f(t + 0) and 
f(t) = f(t — 0) holds. 

We shall add to the definition of the space D{0, 1] the usual convention that 
every two functions f(t) and f(t) for which the equalities f;(¢ + 0) = fo(t + 0) 
and f,;(t — 0) = fe(t — 0) are satisfied for all ¢ e (0, 1] will be considered as 
one element of D(0, 1). 

Let f e« D{O, 1} and let Ty be the graph of function f, that is, the set of points 
(t, wu) such that ¢ ¢ [0, 1] and u satisfies one of the following inequalities: 


fit-—0)Susf(t+0) and f(it+0) Sus f(t —0). 
Note that every graph is a bounded closed set on the plane. Let 


wy(A) = sup (|f(t) — f(b). 


ty,tee4 


We shall consider two functions 


w;(a) = sup wy,(A) 
4;|A\sa 





564 ROBERT BARTOSZYNSKI 


(1.6) w(a) = sup sup min (wy\[71, 70)}, wel (0, 72]}), 
A;ia 


sa roed 
where A denotes the interval [7:, 72]. The following propositions have been 
proved by Prohorov [1]: 

I. If the function f has no jump greater than c, then 


(1.7) w;(a) S 4(w;(a) +c) forall OS al. 


II. w;(a) is a non-decreasing function, and w,(a) | Oasa | O. 
Let 


( t,(e*) for 
(1.8) ry(z) =< 


lwy(1) for 


Let f, g « D{O, 1). Define 


(1.9) di(f, g) = max {sup inf |p — q|, sup inf |p — g}} 


pely gel y pel, gels 


and 
(1.10) do(f,g) = L*(r;(z), r,(z)), 


where L* is the Lévy distance defined by (1, 3). Then the following theorem 
holds: 


THEOREM E. The function 


(1.11) d(f,g) = d(f,g) + df, g) 


defines a metric in the space D{0, 1]. The space D{O, 1] with the metric d is separable 
and complete, the subspace C[0, 1] is a closed set in D{O, 1] and for the subspace 
C0, 1] the d-convergence is equivalent to the uniform one. Moreover, if d(f, ,f) — 90, 
then f,(t) — f(t) at every point of continuity of f(t). 

Conditions for compactness of the subsets of the space D[0, 1] are given by 
the theorem: 

THEOREM F. The set B C D{0, 1] is compact if and only if, there exists a con- 
stant M > 0 and a function h(e) | Oase | 0, such that for all fe B 

sup |f(t)| < M 


Osts1 


We) Sh(e) fon OSe S31. 


Now we shall prove the following inequality 


(1.12) d(f,g) S 3 sup |f(t) — g(t)|. 

Ostsl 
In fact, if supo<:<: |f(t) — g(t)| S «, then, of course, d,(f, g) S «. Since the 
functions w#,(a) and w,(a) satisfy the inequality |w,;(a) — w,(a)| S 2e for 
every a, then also |r;(z) — r,(z)| S 2e for every z, hence d2(f, g) S 2«, which 
was to be proved. 





WEAK CONVERGENCE OF MEASURES 565 


Let t:, ---, tm be fixed points from [0, 1]. Denote by ¢ = ¢i,....,1m the func- 
tion mapping D{0, 1] (or C{0, 1]) into R,, , defined as o(f) = {f(h —0),---, 
f(tm — 0)}. For » ¢ M(D{0, 1}) (or uw e M(C{0, 1])), we shall denote by yu" 
the measure in M(R,,) defined as u"'*"'"'"(A) = ufg'(A)} for u-measurable 
¢ (A). 


2. Convergence of measures in metric spaces. In the present section, R 
will denote an arbitrary fixed complete separable metric space with the metric p. 
Let fi(z), fo(a), --- be a fixed sequence of continuous mappings of R into the 
real line R,. Suppose that for every x we have sup, |f,(2)| < ©. 

Denote by ¢; the mapping of R into k-dimensional Euclidean space R, defined 
as g(x) = {fi(x), ---, fe(x)}. If w ¢ M(R), we shall write »* = u**. Further 
we shall use the notation p*(z, y) = sup, |fa(z) — fa(y)|. 

The following theorems hold: 

THEOREM 1. [f the functions f,(x), fo(x), --+ are equicontinuous at each point 
x € R, that is, if the condition p(x» , x) — 0 implies p*(xm , x) — 0 then a neces- 
sary condition for the convergence tn, => po(un € M(R), n = 0, 1, ---) is 
(2.1) lim sup L(un, uo) = 0. 

n~o . 

THEOREM 2. If the condition p*(xm , x) — 0 implies p(xm, x) — 0, then (2.1) 
is a sufficient condition for the convergence tn => wo(un € M(R), n = O, 1, ---). 

To prove these theorems we need some lemmas giving the connections be- 
tween e-neighborhoods in the spaces R and R;,. 

Lemna 1. If the conditions of Theorem 1 are satisfied, then for an arbitrary com- 
pact set D C R, any integer k and any set F C R, 


DN [en (F)IS C ok (FF), 
where mo(e) | Oase | O. 
Proor. At first suppose that F = {x,,---, ax} and let &e DN [gi (F)I*. 
It follows that there exists 7 ¢ ¢:'(F) such that p(£, 7) < «. Since D is supposed 


to be compact it follows that p*(&, ») < wp(e), where mo(e) | Oas € | O. 
Then also 


max |f;(£) — fi(n)| < mo(e) 
lsisk 


and since fi(n) = a:, (¢ = 1, --: , k), we have 


& 


-) eee. role 
oE Se (its, *** 5 al 2, 


To complete the proof it is sufficient to note that pre-images and e-neighborhoods 
are additive and mp(¢) does not depend on {2,--- , xx}. 
Lemma 2. /f F C R is compact, then for arbitrary « > 0 andi > 


“x 

_ ’ ye+5 
n ge lex(F)‘] C Fs”, 
k= 


where F% denotes the a-neigborhood of the set F in the metric p*. 





566 ROBERT BARTOSZYNSKI 


Proor. Let « > 0 and 6 > O be arbitrary, and let & ¢ fier ge lex (F)‘]. It 
follows, that for every k we have ¢,(£) ¢ o(F)*‘. Then, there exists m ¢ F such 
that maxi <i<z |fi(E) — fi(m)| < ¢. Since, by assumption, F is compact, we 
can select a convergent subsequence from the sequence {,}. Suppose, without 
loss of generality, that m — » ¢ F. Let n be an arbitrary integer. Then 


\fn( &) — fn(n)| = \fn(€) — fn(ne)| +} fn( ne) — fa(n)}. 


For a sufficiently large k, the first term on the right hand side of the last formula 
is less than ¢ and the second is less than 6. Hence it follows that 


p*(é, n) = sup |f,(&) — fa(n)| < € + 4, 


which was to be proved. 

Proor oF THEOREM 1. Suppose that u, = wo. Then L(u,, wo) — 0 and for 
any 6 > O there exists a compact set D; such that sup, un(D3) < 6 (see [1]). 
Let L(u,n, uo) S a; then it follows that for every closed set A C R we have 
un(A) S wo(A*) + @ and wo(A) S wa(A*) + a. Let k be an arbitrary integer 
and F C R, be an arbitrary closed set. Then, by Lemma 1 and the fact that the 
set gi; '(F) is closed, we obtain 
uo(F) = wolge'(F)} S alge (F)*} + 

<< wilDsN[e'(F)\} +atsé 
unige (F)} +a +6 = wn(F™) tate. 
Similarly we obtain u.(F) = of PF) + a + 6, which implies that 
Lins, uo) S max (a + 6, ms(a)), 


and also 


sup Liui, uo) = max (a + 6, ms(a)). 
k 


Let « > 0 be arbitrary. Choose a fixed 6 < ¢/2 and then find a such that a < «/2 
and m(a) < e. Then choose N = N,, such that for n > N we have 
L(un, uo) S a. It follows that forn > N 


p k k 
sup L(un, wo) < € 
k 
which was to be proved. 


Proor oF THEOREM 2. Suppose that the condition (2.1) is satisfied. Let « > 0 
be arbitrary and let forn > N, 


(2.2) sup L(un, uo) < €. 
6 
By Lemma 2, for every n and every compact set F we can find k = k,,r such 
that 
(2.3) unige lee(F)‘}} S un( Fs) + €. 


Then, for every compact set / we can write the following chain of inequalities, 
using the conditions (2.2) and (2.3): 





WEAK CONVERGENCE OF MEASURES 


un(F) S wnlge'ler(F)}} = waler(F)} 


S woler(F)} + € = wolge ler(F)}} + € S wo( Fs) + 2. 
In a similar way we prove the inequality 
yo(F) S un(Fs') + 2c. 
Hence, by Remark 1 (made at the beginning of this paper) we get 
Ds(un, vo) S 2e for n>N,, 
where the distance Ls is calculated according to the metric p*. Thus pu, => po 
in the topology generated by the metric p*, and lim sup,..e ua(/) S po( FP) 


for all sets F closed in the metric p*, hence, for all sets F closed in the original 
metric p, which was to be proved. 


3. Some lemmas. Let » ¢ M(D(0, 1]). We sball say that t is a continuity 
point of the measure yu if the set of those f ¢ D[O, 1] which are discontinuous at 
the point é%& is of u-measure zero. 

The first lemma we are going to prove gives some regularity properties of the 
behavior of the functions f in the neighborhood of the continuity points. 

Lemma 3. [f t,,--- , tm are arbitrary continuity points of the measure yp, then 
for eerye > 0 


{ inf E (n \h sup f(t) < a+ +) 


<p (n if f(t) < n\)]} = 0, 


where T;, denotes the interval [t, — c, t. + c]. If for some « > 0, and for some par- 
ticular points t,, --- , tm (without the assumption that they are continuity points) 
the relation (3.1) is not satisfied, then there exists ay > 0 and t, (among t , --+ ,tm) 
such that 


(3.2) ulf; f(t + 0) — f(t — 0)| > ao} > ao. 


Proor. Suppose that the points 4 , --- , tm are continuity points of the meas- 
ure uw. Then for arbitrary c > 0, e > 0 and arbitrary 2,,---, 2m we have 


jm (n {fs fl) < 24}) 


0 (A 1h) < 2d 0A Uf sup Is) - 4] <4) 


c 
te T) 


0(A tis) <a [A Ls sup Iso - Hu) 24) 


te TE 


“(A {f;sup f(t) <a + ‘) 


-_ teTf 


+ Duthsup f(t) — f(a)| 2 41, 


c 
teT) 





ROBERT BARTOSZYNSFI 


or 


w(A {f;sup f(t) <a + ‘) _ “(A {f: flh) < n}) 


k=l teTf = 


+ Daf sup | f(t) — f()| >} 20. 


teTf 


Let c, be an arbitrary sequence of numbers decreasing to 0. Denote by A,. 
the set of those f’s, such that 


sup |f(t) —f(&)| 2 « 


cn 
te T) 


Then A, x. D Anes and 


int, [u (A ti ou $0) <a +4) H(A 16H) <a) 


k=l teren =] 


- a u( Ane) = 


On the other hand, for every k we have 


Ar = 1 Ana C {f; |f(ti +0) — f(t — 0)| = dt, 
and 0 = uw(Ax) = lim,.. u(An,x), which proves the inequality (3.1). The 
second part of Lemma 3 follows immediately from the first part. 

The succeeding lemmas which we shall prove will give us some connection 
between neighborhoods in the spaces D0, 1] and R,, . 

Let t; , --- , tm be fixed points of the interval [0, 1] and let ¢ denote the func- 
tion mapping D{0, 1] into R,, , defined as g(f) = {f(t: — 0), --- , f(tm — O)}. 
Let B be an arbitrary compact set in D[O, 1] and let C = C(O, 1] denote, as usual, 
the space of all real continuous functions on [0, 1}. 

Lemma 4. For an arbitrary set K C Rp, 


(3.3) [eo (K) NB NC ce (KY) 
and 
(3.4) le (K) N BNC} ce '(K"*), 


where Wa(e) | O and Oe(e) | 0 as € | O. 

Proor. Note first, that since pre-images and e-neighborhoods are additive, 
it is sufficient to prove inclusions (3.3) and (3.4) for sets K = {a,--+- , Lm. 

To prove (3.3) suppose that f ¢ [¢ ‘({a1, --- , &m}) M B)SNC. It means that 
there exists a function g e D{O, 1], such that g e ¢ ({a1,--:, tm) NB and 
d(f,g) < «. In other words 

(1) g(t) = z,k = 1,2,--->,m, 





WEAK CONVERGENCE OF MEASURES 


2) ge B, 

(3) d(f,g) < «, 

(4) f is continuous. 
Condition (2) means that #,(a) S hs(a) for 0 S a S 1, where function 
h(a) is a function appearing in the condition for compactness of the subsets of 
D{0, 1] given by Theorem E of the Introduction. From condition (3) it follows 


that d,(f, g) < « and d2(f, g) < «, where d; and d» are defined by (1.9) and 
‘ ; , / 7 
(1.10). Then, there exist points ti, --+, t,, with 


; 
max |t; — ¢,| 


lsksm 
and 
max |f(t) — g(tk)| <e. 
isks 


Using (1) we obtain 


-\ 1 , 
(3.5) max |f(ti.) — a| < €. 
lsksm 
Condition d2(f, g) < ¢« implies that for every z 
w(e"*) —e S w(e’) S B,(e"*) + 


and hence by (2) 
;(e") S e + hale’). 
Putting e° = ¢ we obtain for sufficiently small « 
W(e) S e+ hale’-e) S € + hg(2e). 
Since f is supposed to be continuous, we may use (1.7) obtaining 


ws(e) S 4(€ + ha(2e)), 
which means that 
(3.6) sup \f(m) — f(r2)| S 4(€ + ha(2e)). 
71,72;\Ti—-T2/\<€ 
Combining (3.5) and (3.6) we obtain 
max |f(t,) — ze] S € + 4(¢ + ha(Ze)) = Wale) 


isksm 


which implies that f ¢¢ ‘({21,---, 2m}"*®“’), and according to Theorem E, 
we have ha(e) | Oase | 0, which proves the first part of Lemma 4. To prove 
inclusion (3.4), suppose that f ely ({a,---, tm})N BNC}. This means 
that there exists function g ¢ D{0, 1} such that 

(1) g(t) = 5% = 1, 2, eS See 

(2) ge B, 

(3) g is continuous, 

(4) d(f,g) <«. 





570 ROBERT BARTOSZYNSKI 


First we shall prove that function f cannot have jumps greater than 8ha(e) + 2¢, 
where hs(e) denotes, as before, the function appearing in the condition for com- 
pactness of the set B. In fact, suppose that for some & we have 


f(t + 0) — f(t — 0)| > 8ha(e) + 2e. 


It follows, that max{|g(t) — f(t + 0)|, \g(t) — f(t — 0)|} > 4ha(e) + «. 
Suppose, for instance, that |f(t +0) — g(t)! > 4ha(e) + €. Since ge B 
and g is continuous, it follows from (1.7) that 


w,(e) $4 W,(e) S 4 hale). 
This condition implies that 


sup |g(t) — g(to)| S 4 ha(e) 


t;|t—to|<e 
and we obtain for some ¢’ with |& — t’'| < e: 
\g(t’) — f(t + 0)| > € + 4 he(e) ‘ina 4 ha(e) = € 


hence d;(f, g) 2 € in contradiction with (4). ; 
Now, from (1) and (4) it follows that there exist points t, --- , tm with 


/ 
max |t; — &| < « 
lsksm 


and 


(3.7) max |f(t,) — 2x! 


lsksm 
From (4) it follows that for every z 


~ z—e€ 


W(e “) — (ee) S w,(e"*) + «, 


which implies that 
w(e") S e+ hale‘). 
Putting e° = « we get for sufficiently small « 
Wr(e) S € + ha(2e) 
and applying (1.7) with c = 2e + 8ha(e) we get 
(3.8) wy(e) S 4[e + ha(2e) + (2e + 8 ha(e))). 
Combining (3.7) and (3.8) we obtain 


max |f(tk) — re| S € + 4[e + ha(Qe) + (Ze + BSha(e))] = Oa(e) 

lsks 
which means that f eg “({x,, --: , tm}°8"°). Since Oa(e) | Oase | 0, Lemma 
4 is proved. 
, Let t,---, tm, +++ be a fixed sequence of points dense in [0, 1]. Denote by 
gx the mapping of D{0, 1] into R; , defined as ¢.(f) = {f(t — 0), --- , f(t — 0)}. 





WEAK CONVERGENCE OF MEASURES 


We shall prove 
Lemma 5. If F Cc D (0, 1] is compact, then 


(3.9) n ge ler(F)‘] CF. 


Proor. Suppose that f ¢ f\ i: ¢: [¢x(F)‘]. It follows that for every k there 
exists g., ¢€ F, such that 


(3.10) max |f(t;) — ge(t;)| < 

lsjsk 
Since F is supposed to be compact, we can select a convergent subsequence 
from the sequence {g:}. Without the loss of generality, we may assume that 
gx >g € F. Let + be a continuity point of functions f and g. For an arbitrary 


5 > 0 let us choose a point ¢,, from the sequence {t,} which is a continuity point 
of the functions f and g and such that 


f(r) — f(tm)| <6, |g(7) — gltm)| < 6. 
For all sufficiently large k we have 
\g(tm) — ge(tm)| < 6, 
hence by (3.10) 
f(r) — g(7)| S If(r) — f(tm)| + |f(tm) — gu(tm)| 
+ |gu(tm) — g(tm)| + |g(tm) — g(r)| < € + 38. 
Since 6 is arbitrary we get 


sup f(r) —g(r)| SB « 


where A denotes the set of points in [0, 1] at which both functions f and g are 
continuous. Hence 


sup |f(¢ — 0) — g(t — 0)| S «, 
Osts1 
sup |f(¢ + 0) — g(¢+0)| S«, 
0<stsl 
and by the convention concerning the identification of elements of D[O, 1] and 
the relation (1.12), we get d(f, g) S 3« < 4, which was to be proved. 
4. Criteria for convergence of measures in functional spaces. Let 
un € M(D(O, 1)), n= 0,1,2,---. 


We shall give some conditions for the weak convergence u, = uo expressed in 
terms of distances L(y." ‘=) of corresponding measures generated in 
an m-dimensional space R,, . All these conditions consist of requirements of some 
kind of uniformity in 4,-*-, tm. 





572 ROBERT BARTOSZYNSKI 


We shall repeat that point f is a continuity point of measure uy if the set of 
those functions f ¢ D[0, 1] which are discontinuous at the point & is of u-measure 
zero. If every point ¢ ¢ [0, 1] is a continuity point of measure yu, then we shall 
say that the measure u has no fixed points of discontinuity (note that this does 
not imply that u{(C{0, 1])“} = 0). 

THEOREM 3. Let u, ¢€ M(D(O, 1]), n = 0, 1, 2, --- and let measure uo have no 
fixed points of discontinuity. Then for the convergence un, = yo tt is necessary that 


ogt 


(4.1) lim sup L(pst'""*, ws") = 0 


Now ty,-++,bm 
for every m. 
Proor. Note first that the convergence 


lim L(pit'""' 


n~2 


m ty 
» Mo 


for every fixed t,, --- , tm €[0, 1] follows from Theorem D and from the fact 
that the function ¢,,....,..,(f) = {f(i; — 0), ---, f(tm —0)} is a po-almost 
everywhere continuous mapping of D{0, 1} into R,,. 

To prove Theorem 3 suppose that condition (4.1) is not satisfied. Then, there 
exists a number « > 0, a sequence n; — ©, a number mo and a sequence 


i i 
{ti, ey tm} 


such that 


t + 


Oy. --o58 
(4.2) ™) > «, 


compact. In fact, consider an arbitrary sequence {,o' C > 
k k k k 
Let us select from the sequence {f:, --- , fm} a subsequence {f;’, --- , tm} con- 
0 0 ~* . . . 
vergent to (ti, --:, tm) Since the measure yo has no fixed points of discon- 
tinuity, we have 


1 m . a 
= wo as j— @. 


By Theorem C, for arbitrary 5 > 0, there exists a compact set B; C R,, such 
that 


sup pot" ""(Bs) < 8. 
+t 


ti, m 


It follows that the distance L in the formula (4.2) can be replaced by the Lévy 
distance L* defined by the formula (1.4). For simplicity of notation we may 
assume, without loss of generality, that mo = 1. Then formula (5.2) takes the 
form 


(4.3) L* (un, : po’) MS i= l, 2, ~ 


Furthermore, we may assume without loss of generality that t; — t&. Since 
measure yo Was supposed to be without fixed points of discontinuity, we have 





WEAK CONVERGENCE OF MEASURES 


(4.4) lim L*(yo*, wo°) = 0; 


hence for all sufficiently large indices i we have 
(4.5) L* (uni, , wo’) > €/2. 
From (4.5) it follows that there exists a sequence {z;} such that 


(4.6a) bn tS; f(t) <at+ 4 €o} + d€ < bol f; f(t) < x} 


or 


(4.6b) Mn ti S(ti) < xt — Fel — Feo > wolf; f(t) < zi}. 


Suppose now, that the first inequality (4.6a) holds for infinitely many 7’s. Then 
for any c > O and for every sufficiently large 7 for which (4.6a) holds, we have 


(4.7) Malt; f(ti) < xi + Feo} 2S wal f; sup f(t) < zi + Feo} 
teTS 
where 75 = [to — c, fo + c¢]. 
On the other hand, according to Lemma 3, for « = }¢, we have 


(4.8) wolf; f(ti) < xi} < wolf; sup f(t) < 21 + 40} + feo. 


e 
teTo 


From (4.6a), (4.7) and (4.8) we obtain 


wolf; sup f(t) < x; + te} > walf; sup f(t) < xi + feo} + te 


teTS teTS 
¢ ¢ 7 
hence L*(us,; , uo) > eo, where 


e(f(t)) = sup f(t). 


c 
te T) 


Since ¢ is a wo-almost everywhere continuous function on D0, 1], it follows 
that un; ~> wo. 

Suppose now, that the inequality (4.6a) holds only for a finite number of 
indices 7; hence, beginning from some i, the inequality (4.6b) holds. If the 
points ¢; are continuity points of infinitely many of the measures u,,, then the 
inequalities (4.7) and (4.8) are true with u,,; and mo interchanged, and the proof 
remains unchanged. If for all sufficiently large 7 the points ¢; are the discontinu- 
ity points of the measures yn,, then, according to Lemma 1, there exists a se- 
quence 6; such that 


(4.9) un At; \f(ti + 0) — f(t; — 0)| > Bi} > Bi. 


Let 8; be the upper bound of the numbers 6; for which the inequality (4.9) 
holds. If 8; > 0, then, again, we can make the above estimations. Suppose, then, 
that Bi, > By > 0, k = 1, 2,---. Take a 6 > O such that for the interval 
A; = |to — 5, to + 4] we have wo{f; wy(As) > Bo} < $80, where w;(A;) is de- 
fined as sups,,t.e4; \f() — f(%)|. On the other hand, for all sufficiently large 
i, , we have by (4.9) 





574 ROBERT BARTOSZYNSKI 


Bo < Bis < bn AS; f(t +0) — f(t, — 9)| > Bi} Ss Mn tS; w;(As) > Bo} 


and hence Bins, ~> ub , Whereg(f(t)) = w;(Asz). Since ¢ isa uo-almost everywhere 
continuous function on D{0, 1] it follows that Mn;, 7 Mo, as asserted. 


TueoreM 4. Let wu, ¢ M(D(O, 1}), n = 0, 1, 2, --+ and let the measure uo 


satisfy the condition pol (C{0, 1])°} = 0. Then a necessary condition for the con- 
VETGENCE fin => bo 18 


(4.10) lim sup 


n~o 


sup L(s"', we") = 0. 
tree + stm 

Proor. Suppose that yu, => wo. According to Theorem B, it follows that 
Lun, 4) 0 and according to Theorem C, it follows that for any given 6 > 0 
there exists a compact set B; C D{0, 1] such that sup, u,n( Bs) < 6. Lett, +--+ , tm 
be arbitrary points in [0, 1] and let ¢ be the function appearing in Lemma 4. 

Suppose that for some fixed n we have L(u,, wo) S a. According to (1.2), it 
follows that 


(4.11) bn(F) S wo(F*) +a and po(F) S u(F’) +4 


for all closed sets F C D(0, 1). 
Let K C R,, be an arbitrary closed set. Then the sets 


¢ (K)NB and ¢ (K)NBNC(, 1] 


are also closed, and we can write the following two chains of inequalities, using 
Lemma 4 and (4.11): 


‘"(K) = unfe '(K)} S ule “(K) N By + 6 


(4.12) S pollo "(K) N Biy*} +6 + a = pollo (K) N BJ’ N CIO, 1} 


+i+aS wie (K)} +6 +a = w(K) +5 +a; 
‘™(K) = wie (K) N C(O, 1} 
(4.13) < wie" (K) NCO, 1) N Bs +6 S ualle"(K) N ClO, 1] N Bl’ 
+5 +a 5 mle '(K")} +6 +0 = we '™(K"™™) +6 +a. 
From (4.12) and (4.13) it follows, according to the definition of metric L, that 


fm wo") S max (6 + a, ¥s(a), 05(a)) 


where 6 > 0 is arbitrary, ¥s(a) | 0 and @(a) | Oasa | O. 
Let « > 0 be arbitrary. Take 6 < }¢ and then take a < 6/2 such that 





WEAK CONVERGENCE OF MEASURES 


max (¥s(a), @(a)) < «. 


For this value of a take N = N, such that for n > N we have L(u,, uo) < a. 
Hence 


for all n > N, which was to be proved. 
THEeorEM 5. Let wu, ¢ M(D{O, 1]),n = 0, 1, 2,---. Then the condition 
(4.14) lim sup sup L( us - ue) = 0 
»ree,t 


n>o m ty; m 


is sufficient for the convergence pn => Uo . 


Proor. Suppose that condition (4.14) is satisfied. Let « > 0 be arbitrary and 
let forn > N, 


(4.15) sup sup L(pe'", wot") < te. 
m ot * * bn 


ty 


Let t:,---, tm, *** be a sequence of points dense in [0, 1]. Denote by ¢ the 
function appearing in Lemma 5. By Lemma 5, for every compact set F C D(0, 1] 
and for arbitrary n = 0, 1, --- there exists k = k(n, F) such that 

(4.16) untge lee(F)*}} S un(F*) + fe. 


By (4.15) and (4.16), we can write for n > N, the following chain of inequal- 
ities: 
_ 7 cece 7 he 
un(F) S unalone loe(F)}} = aM ee(P)} S wot {ge( F)*4 + de 
—1 y\2¢€ , + 
= wolgir'lee(F)*} + te S mo(F*) + te + fe = wo(F*) + . 
In a similar way we prove that for all compact sets F 


wo(F) S u,(F*) + «; 


hence, by Remark 1 (made in the Introduction) L(u,n, uw) S « forn > N,, 
which was to be proved. 

As an immediate consequence of Theorems 4 and 5 (and also Theorems 1 and 
2), we obtain 

THEeoreM 6. Jf u, ¢ M(C{O, 1)), n = 0, 1, 2 --~ then for the convergence 


Mn => Mo 


it is necessary and sufficient that 


(4.17) lim sup 
t 


n>o Om l 


t t 
sup L(p,' e 
7 *,tm 


Proor. Necessity of this condition has already been proved; sufficiency fol- 
lows from the fact that for the space C|O, 1] C D/O, 1] the d-convergence is 
equivalent to the uniform one. 





ROBERT BARTOSZYNSKI 


REFERENCES 


[1] Yu. V. Pronorov, ‘‘Convergence of random processes and limit theorems in probability 
theory”’ (in Russian, English summary), Teor. Veroyatnost. i Primenen., Vol. 1 
(1956), pp. 177-238. 

[2] Paut R. Hautmos, Measure Theory, D. Van Nostrand, New York, 1950. 

[3] B. V. GNepENKOo AnD A. N. Kotmocorov, Limit Distributions for Sums of Independent 
Random Variables, Addison-Wesley, Cambridge, Mass., 1954. 





EXPONENTIAL BOUNDS ON THE PROBABILITY OF ERROR FOR A 
DISCRETE MEMORYLESS CHANNEL 


By Samvue.t Korz 
Cornell University and Bar-Ilan University, Ramat-Gan, Israel 


1. Summary. In a paper by Blackwell, Breiman and Thomasian [1, Theorem 3] 
the following theorem is proved: 

For any integer n and for any 0 < ¢« S 3, such that C — « 2 O there exists a 
code for a discrete memoryless channel with length N > e"‘°* and with a bound 
for the probability of error, X = 2 exp, — [neé’/(16ab)], where C is the capacity of 
the channel and a and b are the numbers of elements in the input and output alphabets 
respectively. 

In this note we shall replace the bound 2 exp,[—ne’/(16ab)] by the expression 
2 exp.{ —ne’/[g(c) (log c)**]}, where c = min (a, b), g(c) is a positive mono- 
tonically decreasing function of c, g(c) < 16 for alle 2 3 and approaches 2 
asymptotically asc — «©, and 6 > 0 depends on e and c and tends to 0 as either 
c— © ore—0O. 


2. Preliminary Lemmas. 
Lemna 1. Let 


a,b 
P;,20(¢ =1,-:-,a,j =1,---b),> Pi =1Pi = Dd P,Q = dD Pi; 
J t 


3 
and c = min (a, b). Then 


a,b P 
(1) >, Pe (ioe Ba 
b 


< [log (1 + e+ c)]’ for alle, 


y > Py 


J 


2 
J 
a) 
Pi , 9 24° 2 oe 
log ri) 2.343 (log ce)” for ec = 2, 
“ ) 
Q; 
a) 


Pi; (10g cS <2(logc)’ for c = 3, 


P;; | log 5 < 4e°* + (logc)* for 


PROOF: 
(1). Let 
8} t (2, 7) 0 < P;;/(P:Q;) < &"} 
s = {(i,j) |e" S Pis/(P:Q;) s 4 
83 {(4, 7) | Pi3/(P:Q;) > 4 
and let S = > Pi: {log [P;;/(P.Q;)]}". Then 
Received August 23, 1960. 





SAMUEL KOTZ 


S = 2 Pif (5%) > a Pi; fle) + > Pf (y “), 


2 
where f(x) = (log x)", convex for x 2 e. 
Since the arguments of f in (1) are all = e, S S f(K), where 


. P;Q; 
K= P= vetLp 


Py P; 4 


since K = Pas PQ ja:; , where all 2;; are = 1. However, 


S Pott Qi < . 2. Pah 


81 Pi; 


a 
> 
> s>F => 5 (b Pu) =4, 
83 P; a; = int P; J - 
and similarly >_,, Pi;/(P:Q;) < b. Since f(x) is monotonically increasing for 
x = 1, the result follows 

2 


(2) and (3). Consider f(P,,---, Pa) = bi P,(log P;)*, where P; = 0 
and 7 os P; - 


Using the method of Lagrange multpliers we easily find the unique maximum 
. 4 : : ‘ 2 
of this function for the case n > e (i.e.,n = 3) to be (log n) 


, which is attained 


a 
forp, = n (4 = 1,---, mn). Let, now, 


S=)>P; (toe Pil s) Tare (lo eG) 
(6) ' ? 


— male Fs) (log P. ) + 2 (log P,;)*-P;. 


From the above it follows that the first and the last terms of (6) are S 
and the second is non-positive. Hence, owing to the symmetry of S in 7 and ), 
the assertion (3) follows. 


(log a)? 


(2) follows immediately by using the same method and considering the func- 
tion x(log x)? + (1 — z)flog(1 — x)P forO0 < x <1. 
4). Let 
si = {(4,j)|0 S P,,/(PQ) S13; 82 = {(¢,9)|1 < Pis/(P.Q;) 
8; = {(t, 7) | Pi;/(P:Q;) > e. 


Let f(x) = (log z)*(a2>0):h(x) =2 log’a(a = 0) andgx(z) = rlog’(#/K) —2 
(x 2 O, K-integral). 
It is easily seen by elementary methods that 


maxo<2<1 gx(t) = gx(1) = (log K)* — 1 for K > e'*~* 


MAaXo<z<1 h(x) = 4e. 





EXP BOUNDS FOR DISCRETE MEMORYLESS CHANNEL 


Now 


P 2 
* Pi (108 Pel +) > PiQjh (F ‘| se 2 Pi; (ioe P.t os) 
s max § (7 ioe walters say, 


since f(e) = 1 and this function is monotonically increasing on 8¢ . 


Moreover Pin P,,{log [P;;/(P.Q;)]?? = a>, [P;;/alf(P;,/(P.Q;)], and, since 


f is convex on 83 , we have 


a Pi; 
X Pi; (108 P, ra) < of (2 where => P.Q, 


) 
gz ee[oG)-4] 


Now @ S min (a, b) = c,andons;: e 
monotonicity of f on 8; , it follows that 


f(a/@) = f(0/a) S f(c/a) = f(a/c). 


From the definition of gx(2) and (7) we obtain 


V+ DU S14 9.(a). 


Thus 


< (c/a). Therefore, from the 


Hence 


~ + ¥ S (loge)? for ¢>e*¥? (i.e.,¢ = 12) 


From here the assertion follows. 
Lemma 2. Let 0 < 6 S 1 andt > 0, then, for x 2 5, 
«'<s1-—tlogaz + [38 ‘(t log x)’, 
where the equality occurs if and only if x = 6 = 1. 
Proor. The result follows directly from the obvious inequality 


(8) esl ty t (fe*)y’, for ally 3 R, 


where y S R and R is any non-negative number, by substituting y = —t log z. 
The equality in (8) holds if and only if y = R = 0. 


3. Proof of the main result. Consider a discrete memoryless channel with 
input alphabet having a (>1) elements and the output alphabet having b (> 1) 
elements. Let P(-) be a probability distribution on the elements i of the input 
alphabet (¢ = 1, --- , a), and let P(-|7) be a distribution of the elements 7 
of the output alphabet (j = 1, --- , b) for every 7 of the input alphabet. 





580 SAMUEL KOTZ 


As in [1] we start with the r.v. J(P) defined by 


( oie Puy \ = P(i,j) if P(i,j7) > 0 
> >. = ee, ve ; 
Pri J(Pi 43) = be Bo OGh =0 if P(i,j) = 0, 
where P(i, j) = P(j|i)P(i) and Qj) = 24 PCi, j). 


It is well known (e.g., [1]) that the capacity C of a channel is defined by 
C = supp EJ(P), where the supremum taken over all possible input distribu- 
tions, is actually attained for some P = P. We choose in the definition of the 


r.v. J the input distribution to be P, so that C = EV. 
The moment generating function of J is given by 
ait es P(i,j) 
(9) E(e"') = >> Pi, | arate ' 
aq LPORD 
Lett > Oand/, = J; + --- + J,, where J/x(K = 1, --- ,n) are independent, 


identically distributed random variables with the distribution of J. 
From Chebyshev’s inequality it follows that, for any « > 0, 


(10) Pr{I, < n(C'— «)} s [e“°? E(e”)]". 
Let 0 < 6 < 1 andt < 1. Denoting by >’ the sum in the right hand side of 
(9) over all 7, 7 for which P(7, 7)/[P(72)Q(j)] S 46, we obtain 
7 P(i,j) \* er It 
’P(i, ) (Fate s 6 “P(i)QYj) = 6. 
2 PD \pHaG) <4 ” 

Denoting by >.” the sum in the right hand side of (9) over all i, 7 for which 
P(t, 7)/[P(4)Q(j)] > 6 and using Lemma 2, we have 


"p(s; 3 ey P(t, 7 1 — tl _P,j)_ 
2"PCi 3) Cate < 2 PC, 9) | 1 — thos BCH 


s*. P(i,j) ¥ 
V5! (1og ianea) |. 


Let h(c) = 2.343 fore = 2,h(c) = min | (‘8 (tet), 2] for3 Sc < 11 
0: 
and h(c) = min {[log (1 + e + c)/log cl’, [4e° + (log c)*|/(log c)*} for 


c = 12. 


Since 6 < 1, we obtain, using Lemma 1 and the definition of C, 
E(e") <1 — tC + h(c)(38‘?) (log ec)? + 8°, 


where c = min (a, b). 
Let 5° * = qf’ (log cc)’, ¢ > 0 and such that gf’(log c)” < 1. We have 
2 


(11) E(e") <1 —-€ + : {h(c) [gt’(log c)*-"~” + 2g} (log c)?. 


We minimize the expression in curly brackets of (11) with respect to q. The 





EXP BOUNDS FOR DISCRETE MEMORYLESS CHANNEL 
unique minimum is obtained for 


(12) Qmin = [h(c)]* “[t/(1 — t)]‘(t log ec). 


It can be easily checked that gmin (log c)* < 1 for all c = 2, and for all ¢ 
such that ¢ < min (4, [h(c)K(t) (log ec)? ‘}’), where 


K(t) = 2'(t)*4{((1 — ¢)/#)' + (t/ — oy}, 
which we shall require soon. 
Thus, since h(c) > 1, we obtain from (11) and (12) 


(13) E(e”) < 1 — tC + 40h(c)K(t) (log c)*“. 


K(t) tends to 1 as t > 0 and the approach is monotonic starting from ¢ = 0.5100. 
Using the inequality 1 + x S e* we obtain from (10) and (13) 


(14) Pri J, < n(C a e)} < eee 


We shall now assume that c = 3. 

Let 0 < « S 3 be given. For each integer c we choose a real number m 1 
such that if t < [mh(c)|’, then K(t) S D and also {2D (log c)*~ a < 
m'.(Clearlym tT ~andD | lase f ~.) 

We set 


(15) t = to = ¢/{h(c)D(log c)* 


so that t < {2D h(c) (log ec)? 2PM #d Nt < 1 (see Table 1). 
Next, we define R = C — e and d = 2D h(c) (log c)” ©?) (clearly, 
d Tt « withc). For 0 < « < }, 


(16) R + (é€/d) S C — [1 — (2d)"le. 


From (16), (14) and (15) we obtain 


9 


{ 
- 2, v ne \ 
(17) Pr{J, S$ nlR + (e/d)]} s exp 4 at g(c) (log ¢)2—t«/ toa) (oe @)3)) (» 
where h(c)D2d/(d — 1) = g(c). (Clearly, g(c) | 2asc— ~.) 
As in [1], we will now apply the basic theorem of Feinstein which states: 
For any discrete memoryless channel and for any two positive numbers @ and 
A, with A S 1, any input P(-), and any n, there exists a code (n, N, \) such that 


(18) N > e[\ — PiIn(P) < 4] 


(See {1].) 
We set 


2) ea ne 
6 = n|R aa (e /d)| and r = 2 exp vane g(c) dog ¢)?-t/ Dro (lee 07)) | ° 


Applying (17) and (18), we obtain for the case of c 2 3 the existence of a code 
with length N > e"°°~® and probability of error 





SAMUEL KOTZ 


( 2 


a) 
g 


ee. forO <eS 
c) (log c) 


The case c = 2 requires several obvious modifications in the definitions. One of 
the possibilities is to set [2D(log c)|"' < m™", to = €/|h(c)D log c], d = 2D h(c) 
log c and to redefine 


- h(c)D2d (d — 1) 


g(c) 
log ¢ 


This case was treated numerically using the Cornell Computing Center’s 
Burrough 220, where also the numerical values of g(c) for values of c in the 


range of 3-25 were computed. The results of these computations are presented 
in Table 1. 


TABLE 1 


The computed values of h(c), m, D, d and g(c) for values of c in the range of 2 25. 


r h(c) {m h(c)}" D d g(c) 


0.1187 2. 9.891 
0.0421 ; j .366 
0.0232 | 5.999 
0.0168 : 5.017 
0.0136 ; 2. 4.283 
0.0114 ; 2.4% 3.833 
0.0099 f a. 3.529 
0.0087 ; 4. 3.312 
0.0077 ‘ , 3.148 
0.0073 : 2.638 
0.0055 ; 2.517 
0.0044 d Bs 2.440 
0.0039 2.402 
0.0030 > ! 21. .336 


I am much indebted to Professors J. Wolfowitz and J. Kiefer and to Dr. S. 
Kantorovitz for their valuable comments. 
REFERENCE 


[1] Davip BLackwe._, Leo BREIMAN ANp A. J. THomasiAn, ‘‘The capacity of a class of 
channels,’’ Ann. Math. Stat., Volume 30 (1959), pp. 1229-1241. 





AN EXPONENTIAL BOUND ON THE STRONG LAW OF LARGE NUMBERS 
FOR LINEAR STOCHASTIC PROCESSES WITH ABSOLUTELY 
CONVERGENT COEFFICIENTS 


By L. H. Koopmans 
Sandia Corporation 
1. Introduction. Let{i;: —« <i < ©} be a doubly infinite sequence of inde- 
pendent, identically distributed random variables which possess a moment 
generating function M(t) over an open interval Jy. If {a;3 1 S i < »}isa 
sequence of numbers for which }—?., |a,|< 2, then the linear process 


i=l] ) 


{ oO 
|x = >, G5 & 4: 1 sk< is’ 
possesses moments of all orders. 
Let 


n 


7 = E(&), = n>. a; and Sr = > xX: . 
i=l] k=1 
The purpose of this paper is to establish the following theorem. 
THEOREM. For every «€ > 0 there exist constants A and p < 1 such that 


>t \n ‘s.<- uj= e forsome n= m} S Ap”. 

2. Preliminaries. The following lemma will be needed for the proof of the 
theorem. 

Lemma. Let {b;: 1 S i < ©} be a sequence of numbers for which > Rul; < © 
and > b >0.1f S, = > bat X; where X; = > int b&.—« , and if n < 0, then 
there exist constants C and y < 1 such that P{S, = 0} S Cy’. 

Proor. Let Xi, = Pa bé-:, Sar = > set X;,,, and take r > n. By a re- 
arrangement of terms, S,,, may be put in the form 


n—r—l -1 n—1l 


> Br-erte + e? By-2n-ste + 2 Bi nate ; 


k=l—r =n—T 


’ 
. 


where Bnn = > j=mb;. Hence, the moment generating function of S,., is 
n—l po 


(1) Mg,,,(t) = I] M(Bisr-x,rt) I] M(Bisenset) I] M(B, ,t). 
Since the series >>, b; converges, the partial sums B;,; are uniformly bounded 
in i and j fori S j. Thus Msg, _.(t) exists on an open interval Js which is inde- 
pendent of r and n. 

It will now be shown that for each n, Ms, ,(t) converges to a function A(t) > 0 
on J ys as r tends to infinity. 


Received November 8, 1960; revised December 17, 1960. 
583 





584 L. H. KOOPMANS 


The first product in Equation 1 tends to 1 as r becomes infinite, since the 
partial sums B,,,,,, tend to zero for all k and M(t) is continuous at the origin. 

The second product converges (to a non zero limit) for all m and every t ¢ Iys. 
To prove this we apply the test for the absolute convergence of an infinite prod- 
uct (see, e.g., page 15 of [2]). Write 


M ( Bisk.n+it) = ] a M'( One (t) Bise nant) Bise.n sel 


where |@,x| S 1. It is then sufficient to show that 


oo 

De |M"(On.0(t) Bre ntat) Brsi.n-at < @ 
for each n and for every closed subinterval of Js which contains the origin. Since 
M’(t) is continuous, it is clear that M’( 6, 4(t) Bise.n4.¢) is bounded uniformly in 
n, k and in ¢t in the aforementioned closed subintervals of Jys . Hence, we need 
only show ae \Bisz.n+z| < © for all n. 

Let Cy = MaXxiiecicn+s |bi|. Then the sequence !C,} coincides with a subse- 

quence of {\b,;|} except for at most n repetitions of each term. Thus, 


2 n+k 


dX Pieanst S 2. 2, We 


k=l iml+k 


Since the last product in Equation 1 is independent of r, the convergence of 
Msg,,,(t) is established. 

In order to conclude the proof of the lemma it will be shown that the moment 
generating function of S, exists and coincides with A(t) on Iys. Let z = t + tu 
and Ms, .(z) = Ee". Then Msg, (z) is a bilateral Laplace-Stieltjes transform 
which, since |Ms,,(z)| S Ms, ,(t), is analytic in the semi-infinite strip ¢ = 
{z:t € Iys}. The convergence of Ms, _(t) implies that Ms, .(z) is bounded uni- 
formly in r and in z for ¢t in every closed subinterval of Jys which contains the 
origin. Hence, by Vitali’s theorem ([2] page 168), Ms,(z) converges uniformly 
to a limit A(z) for every region bounded by a contour in g. 

The function A(z) is then analytic in o and, since o contains the imaginary 
axis, lim,.. Ms, (iu) = (iu) for all w. Also it is easily seen that l.i.m.,. S,, = 
S, , where l.i.m. denotes limit in the mean of order 2. Hence, by the Lévy conti- 
nuity theorem, A(iu) = Ms,(iu) = Ee™**. But then for all z ino the coefficient 
of z”/m! in the power series expansion of \(z) is the mth moment of S, about 
the origin. It follows that Ms,(t) exists and is equal to A(t) on Iys. 

We have shown that 


Ms,(t) = [] M(Bisenaet) [] M( Biot). 
k=1 k=1 


Since >a b; > 0, there exists an integer N such that for allk 2 N,e< By, <3 
for some e > 0 and 6 < o«. Select ¢* > 0 in Jys so that 


y = max([M(e*), M(ét*)| < 1. 





EXP BOUND ON LAW OF LARGE NUMBERS 585 


This is possible since M(0) = 1 and M’(0) = » < 0. Then, since M(t) is also 
convex, M(ut*) S y for e S uw S 6. The conclusion of the lemma now follows 
from the well known inequality P{S, 2 0} S Ms,(t*) where we may take 
C = max{1/y"", SUPn>w It M ( By se n+it*)}. 


3. Proof of the theorem. Let {a;:1 < i < «} be an arbitrary sequence for 
which > ie \a;| < o and let the value of 7 = E(£) be arbitrary. Now, 


oo 


=e for some rnzmsd p{| S,— ub ~ a 


) n=m ) 


d. [P{(S, — mu — ne) = 0} + P{(—S, + mu — ne) 2 0}| 


ram 


< Ai ge + _As -p: < 2 max (4 -— ) mare , pr) |” 
1— p 1 — pe 1— p 1 — pr 

provided max (p; , p2) < 1. Thus the theorem will be proved if it can be shown 

that, for e > 0, S, — nu — ne and —S, + nu — ne can be translated into 

sums of the form considered in the lemma. 

It suffices to concentrate on the expression S, — nu — ne since the argu- 
ments are the same for both. Write X, — » = >.%: a,0,-; where the random 
variables 6; = &; — » have zero expectation. We now analyse three cases. 

Case I. }o71a; > 0. Set ¢ = ¢/>-3.14;. Then 


Xi—up-e= yh ai( 0; — ¢) 
where E(@) — ¢’) < 0. The theorem now follows from the Lemma. 
Case II. 071 a; < 0. Write 


x 


Xk = 6 2 ee z (—a;)(—98; —_ e 


t=—1 


ad 


where ¢’ = —e/>_7_, a, and again apply the Lemma. 
Case III. 


Let > a; = > * a: + > a: 


i=] 


where >>* a; is the sum of the positive terms and >> a; the sum of the nega- 
tive terms of the series. Similarly, let Xf = >>* adi, Xt = Do” aike-: 
ut = 9 Do’ asandy = 9 >, a,;.Thenif St = >op.,Xfand Sz = Doi. Xi, 
1S, — nw — ne =O} Ss PI Si — nyp* — tne = O} 
(2) . 
+ Pi{S, — nu — 4ne = O}. 


The two terms on the right hand side of this inequality may be dealt with under 
Cases I and II except when one of the sums, >,* a; or >> a; , contains a finite 
number of terms. In this event, the corresponding process 





L. H. KOOPMANS 


{Xi = 2 aé--lsk< 20 } 


is an r dependent process of identically distributed random variables, where r 
is the number of terms in the sum. Then S,., = noe Xx, may be written in 
the form S,,, = > jad Z,,; Where Z,,; is the sum of independent, identically 
distributed random variables obtained by taking every (r + 1)st term of S,,, 
starting with the jth. It is well known (e.g. from [1]) that the existence of M(t) 
and the condition u < 0 are sufficient to guarantee an exponential bound for 
P{Z,,; = 0}; 1 S 7 S r. The bound for S,,, is then easily obtained from the 


inequality 


rT 


P{S., 2 0} SD) P{Z,5 = 0}. 


j=l 


Coro.uary. If the sequence {a;\ is doubly infinite with >-7_« \a;| < ©, the con- 
clusion of the theorem applies to the linear process 


(X, = ) 2 24f-4:1 S$ k < @}. 


7° , + , r 0 r 
Proor. Write X,; = Xi, + Xin where Xx = rel aé—-; and Xe. = 
>- 2, aig; . Then an inequality analogous to Inequality 2 reduces this to two 
applications of the theorem. 


REFERENCES 
{1} HerMan CueERNorr, “A measure of asymptotic efficiency for tests of a hypothesis 
based on the sum of observations,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 492-507. 


[2] E. C. TrrcumMarsu, The Theory of Functions, 2nd ed., Oxford University Press, Cam- 
bridge, 1939. 





EXPECTED UTILITY FOR QUEUES SERVICING MESSAGES WITH 
EXPONENTIALLY DECAYING UTILITY 


By Frank A. Haicur 


Institute of Transportation and Traffic Engineering, University of California, 
Los Angeles 


1. Introduction. When a sequence of messages arrives at some center, they may 
form a queue, owing to delay in reading or processing each message. If the use- 
fulness of a message decays with the lapse of time (as might occur in military 
operations) it would be important to handle incoming messages with a view to 
minimizing the loss of utility. In particular, the order in which items are handled 
assumes a greater importance than in other queueing problems. We consider here 
a single queue of this sort, and, with some distribution-theoretic restrictions, 
derive expressions for the expected (terminal) utility in two cases: (a) most 
recent, and (b) least recent message serviced first, with both random and regular 
departures. 


2. Assumptions. Consider a single queue of messages, in equilibrium, and 
assume that each message has associated with it at time ¢ after entry, a utility 
subject to exponential decay. We investigate the loss of utility due to queueing 
delay in several different circumstances. In each case \ denotes the mean arrival 
rate, » the mean departure rate, p = /u. No messages are removed from the 
queue without completion of service. 

If the initial utility of a message (at the time of entry into the queue) is de- 
noted by yo, the waiting time in the queue (exclusive of service time) by w 
and the final utility (when entering service) by y, then we assume yo and w to 
be independent random variables, with 


(1) y = ye”, 


where £6 is the same for all messages. We also assume the distribution of initial 
utility to be Type ITI, i.e., 


(2) dF(yo) = Ke ys dyo , 0 < yw < ~, 


where KI'(q) = p*. We shall use equations (1) and (2) to determine E(y), 
and in some circumstances the distribution of y also. If the Laplace transform 
of the distribution of w is ¢(s), then E(y) = (q/p)¢(8). 


3. First come, first served; Poisson service. This means that messages are 
taken off the bottom of the pile, and that both arrivals and departures occur at 


random instants. Then the distribution of queue length N (including the message 
being serviced) is 


(3) pn = Prob (N = n) = (1 — p)p’. 


Received October 8, 1959; revised August 22, 1960. 


587 





588 FRANK A. HAIGHT 


The waiting time of a message will consist of n negative exponential phases 
with probability p, . Its distribution therefore consists of a discrete magnitude 
Po = 1 — p at the origin, together with the continuous component 


(4) h(w) = (1 — p)e®”” 0<w< ~, 


’ 


Since yo and w are independent, we can now write down the distribution of y. 
It will consist of the sum of two terms, corresponding to the two components of 
the distribtion of w. The first of these corresponds to y = yo , occurs with prob- 
ability 1 — p and is therefore 


(5) (1 — p)Ke™y*. 


The second term of the distribution of y is found by integrating yo out of the 
joint distribution of yo and y: 


Keys "(1 — p) exp [— (A/u) log (y/yo)](1/By) 


since dw = dy/By. Letting 


T(n, x) -/ et” dt 


denote the incomplete gamma function, we have for the second term of the 
distribution of y 


(6) M1 — 0) (py) r(« had, pu). 


BI (q)y ae 


E(y) can be found easily from this expression, or from the Laplace transform of 
(4) (with the discrete element added), and turns out to be 


. an =) | 
E(y) = ( )Ji— — j 
y) a/)| +r) 


If we choose time units so that A = 1, and units of utility so that 
E(yo) = (q/p) = 1, and let a = (log 2)/8 denote the half-life of information, 
then we can tabulate E(y) as a function of service rate and half-life. The values 
in Table 1 have been obtained. 


TABLE 1 
Expected terminal utility as percentage of expected initial utility; first come, first served; 
negative exponential service 


a=2 


0 

.79531 .87131 .91421 .92615 .93912 
.91421 .95077 .96548 .97342 -97792 
.95308 97411 .98212 .98635 .98896 
.97046 .98405 .98908 .99170 .99330 





QUEUES OF DECAYING MESSAGES 589 


TABLE 2 


Expected terminal utility as percentage of expected initial utility; first come, first served; 
regular service 


a=2 a=43 








0 0 
-86591 - 92437 -94715 -95921 -96631 
. 94924 .97286 .98148 . 98632 - 98866 
97345 - 98602 . 99084 -99281 .99466 
. 98366 -99136 -99430 - 99599 .99635 


4. First come, first served; regular departures. In case service is regular rather 
than Poisson, we need only replace (3) by a formula quoted by Saaty [2], p. 177, 
formula (16). The calculations leading to the distribution of y are then rather 
cumbersome, and will be omitted. We can obtain the mean value of this quantity 
by the simpler method, using the formula for the Laplace transform of h(w) 
quoted by Kendall [5], p. 156, formula (16): 

(8) oe) = iv) — 


(s/u) + p(e** — 1) © 
Making the same choice of units as in the previous section, the values in Table 2 
are obtained. 


5. Last come, first served; regular departures. In this case messages are taken 
off the top of the pile. The number of service phases (each of length 1/u) which 
will delay a given message is by no means the number of waiting messages en- 
countered. However, the probability of an empty queue is the same, and there- 
fore the first term of the distribution of y is given by (5) in this case also. 

To find the second term of the distribution, consider a new arrival to the 
queue, which is not empty. Define an auxiliary queue to consist of this arrival 
and all subsequent arrivals. Then the probability z,,n = 1, 2, --- that the 
original arrival will be preceded into service by n other messages (including the 
one being serviced at his arrival time) is the probability that the auxiliary 
(beginning with one member) will discharge exactly n messages before first be- 
coming empty. 

Therefore the continuous component of the distribution of w is 


(9) h(w) = uptn, (n —1)/p < w < n/p, n= 1 
and of y for fixed yo is 


Ar 


(10) h(y| yo) = By’ yo exp [8 (n — 1)/u] > y >yo exp (—nB/p), 
= 1, 2, om. 


An extra factor p has been introduced into (9) which was not required in the 
corresponding formula of Section 3. This is because the distribution 7, is de- 





590 FRANK A. HAIGHT 


fined over the integers excluding zero, whereas the queue length distribution (3) 
is defined over all non-negative integers. Thus the required integral 


[ h(w) dw = p 
0 


obtains automatically in that case, but must be produced artificially in this one. 

Denoting the second term of the density of y by fo(y), its contribution to the 
expected value of y by E.(y), and making the convenient abbreviation 
ad, = exp (nB/n), we have 


(11) fly) = > : Key dy, 
n=l “an_1y By 
and therefore 
(12) may) = [of Ke meye dys dy, 
0 n=l “Gn_iy B 
Using the transformation 
2 = (Yo — Gnay)/(Yo — Gny) 
from y to z, we obtain after integration 
(13) E.(y) = (Aq Bp) (e*"” — l)r(e ey 


where z(s) = >. 2,8" is the probability generating function of the x, distribu- 
tion. 

Now we consider the z, distribution itself, and its generating function r(s). 
The most unfavorable situation for an entry into the queue is that taking place 
just at the beginning of a service time. For this case Borel [1] has given the value 
for z, , namely 

m = [n”*/T(n)]e°"p”, 


for which the generating function is (cf., Haight and Breuer [3]) 


x n” 
n(s) = sexp >, 
n 


n=l 


n 


1 
Eke a 2). 


! een 
TABLE 3 


Approximate expected terminal utility as percentage of expected initial utility; last come, 
first served; regular service 


a=2 a= 3 a=4 a=5 





.33466 -45492 .52272 . 56822 .59057 
-84089 - 90260 .92936 -94446 . 95434 
-93811 -96572 .97626 .98206 -98530 
. 96835 .98304 - 98866 .99115 -99331 
. 98098 . 99006 .99326 .99520 .99574 





QUEUES OF DECAYING MESSAGES 
Using this approximation, we find that 
(14) E:(y) = (Aq/Bp)(1 — e*) exp (¢ — p), 


where ce ° = exp [—p — (8/u)]|. With the same choice of units and definition 
of a, formula (14) yields the values in Table 3. Next, we obtain exact values 
for a, ; this constitutes one generalization of Borel’s distribution. Another 
generalization is that of Tanner [7], [3] and still another will appear in the fol- 
lowing section. 

Let ¢ be the time remaining between an arrival and the first service termina- 
tion; this quantity is rectangularly distributed over the interval (0, 1/u). The 
probability of x arrivals in the time ¢ is therefore 

i 1 y(x + 1, p) 
(15) [ p—e ad = -—_———, 

0 x! p T(a# +1) 
where T'(n) = I(n, z) + y(n, z). If, in addition to ¢, any complete service 
periods must be waited, the probabilities of x arrivals during one of these are 
the simple Poisson expressions. In order that the queue beginning with one 
member shall vanish for the first time when exactly n members have passed 
through it requires exactly n — 1 arrivals in the n service periods (including 
the fractional one), subject to the restrictions that there will be no arrivals in 
the last period, no more than one in the next to last, and so forth. As an occu- 
pancy problem, we want to put n — 1 balls into n — 1 boxes so that, reading 
from left to right at least as many balls are passed as boxes. 

Combining (15) for the fractional period with the Poisson terms for the whole 
periods in this way we obtain 


= y(1, p)/p, 
y(2, p)/pe*, 
[v(3, p) + 2py(2, p)]/2pe**, 


and in general 


(16) Tn ae 


» (" ’) p” “y(t, p)(n — 1)""* 


T'(n)pe—e 


To find the generating function for this distribution, multiply x, by s” and add 
the terms containing y(7, p) separately for each 7. The first of these is simply 
(s/p)y(1, p). The succeeding values of 7 yield infinite series of the type men- 
tioned by Bromwich [6], p. 160, example 4. If we write the nth sum in the form 


(17) Anls"y(n, p))/[T (ne 
then, using the example of Bromwich, the A, satisfy the equation 


(18) (spe”)""An = [(log An)/(m — 1)]"”. 





592 FRANK A. HAIGHT 
Hence the generating function can be written 


e” y(n, p) ca 
9 a(s) = — — (se ai 
(19) (s) 9 St T(n) (se™*)"A 


, 


An attempt to use (19) with (13) to compute #.(y) leads to very substantial 

calculations, most particularly in connection with finding the A, from (18). 
The short method of Laplace transforms also gives an expression for E2(y) 

which is awkward to compute. Tanner [7] gives the Laplace transform of the 


delay in the form ¢(s) = o*‘, where ¢ is the fractional service period and a satis- 
fies 


(20) log ¢ = p(o — 1) + (s/n). 


Averaging over t, we find 


, (o — 1) 
21) E, ie egies iinet 
-” aly) p(s — 1) — (8/n) 
where 


(99) log ¢ = p(o — 1) + (B/u). 


6. Last come, first served; Poisson service. To deal with this case we need 
first a formula generalizing Borel’s distribution to negative exponential service 
time. Since departures are random, there is now no distinction to be observed 
regarding the fractional service time on entry to the queue. The probability of 
x arrivals in a single service interval is 

© 


(23) [ pe ~'(1/x!)(d\v)%e “dv = p(1l+ att > 
Jo 

Thus 7, are all of the form 

(24) t = Kylp” '/(1 + pI, 


where K, represents the number of ways these arrivals can occur subject to the 
restrictions mentioned in the last section. 

The values of K, can be found by use of Cauchy’s theorem in much the same 
way as Borel used the theorem to evaluate the coefficients in the simpler case. 
Bateman [4] also refers to these numbers (p. 230) in a different context; both 
methods yield the expression 

ea 1 (2n — = 
n\n-l1 


Bateman also gives the generating function of the K, 


20 
(26) G(s) = > Kis" - 3 i $(1 ae 4s)*, 


n=l 


which is useful in finding F2(y). 
Given n, there must have been n departures and n — 1 arrivals between the 
arrival of the particular message and its entry into service. The spacings between 





QUEUES OF DECAYING MESSAGES 593 


these 2n — ‘1 events are each distributed with density (A + yw) exp —(A + x), 
and therefore 


; oo 1 Qn So 9 p” (A + yp) wr 
h(w ae SS eee ee 
iw nai 1 ( n— ’) (1 + p)*"* T(2n — 1)e® 


‘ ‘ —1 2n-—2 
e tee = 1 2n —_ 2 Mn ww” 
sarn\n—1/ (2n — 2)!’ 


sig a wo <1 f2n — 2 n n—l, 2n—2 
¢(8) = [ 68 a ( n E de 
0 


leading to 


Sin\n —1/ (Qn — 2)! 


- 


or, in terms of the generating function G(s), 

(28) $(8) = [(A + w+ B)/ulG[ru/(A + uw + 8)’. 

Using (26), we obtain 

(29) (8) = Ex(y) = [((A+u+8)/uli — 311 — 4au/(A +4 8)’J4. 


Formula (29) has been used in computing Table 4. 


TABLE 4 
Expected terminal utility as percentage of expected initial utility; last come, first served; 
negative exponential service 


a=1 a=2 a= 3 a=5 


-62110 ‘ -69059 


-44476 -55964 
-82961 88946 -91727 -93366 -94453 
-92114 .95353 -96695 .97433 -97902 
-95524 -97486 - 98232 -98657 -98911 


97133 98434 


-98922 


-99177 -99335 


I am grateful to the referee for pointing out a number of errors in earlier ver- 
sions of this paper, and to Mr. John Riordan for confirming formula (16). 
REFERENCES 

1] Bore., Emrue, “Sur l’emploi du théoréme de Bernoulli pour faciliter le calcul d’une 
infinité de coefficients. Application au probléme de l’attente & un guichet,’’ 
Comtes Rendus Acad. Sci. (Paris), Vol. 214 (1942), pp. 452-6. 

2] Saaty, Tuomas L., ‘‘Résumé of useful formulas in queueing theory,’’ Operations Re- 
search, Vol. 5 (1957), pp. 161-200. 

3] Hareut, F. A. and Brever, M. A., ‘“The Borel-Tanner distribution,’’ Biometrika, Parts 
1 and 2, Vol. 47 (1960), pp. 143-150. 

4] Erpe.y1, et al, (Bateman Manuscript Project) Higher Transcendental Functions, Vol. 
III. McGraw Hill Book Co., New York, 1955. 

5] KenpA.., D. G., ‘‘Some problems in the theory of queues,” J. Roy. Stat. Soc., Ser. B, 
Vol. 13 (1951), pp. 151-185. 

6] Bromwicn, T. J. I’A., An Introduction to the Theory of Infinite Series, Macmillan and 
Co., Ltd., London, 1955. 


7] Tanner, J. C., “‘A problem of interference between two queues.” Biometrika, Vol. 40 
(1953), pp. 58-69. 








ON THE CODING THEOREM FOR THE NOISELESS CHANNEL! 
By Parrick BILLINGSLEY 
University of Chicago 

1. Introduction. The purpose of this paper is to examine the coding theorem 
for a noiseless channel from a point of view different from the usual one. The 
idea is to take the base s expansion of a point in the unit interval as a realization 
of the stochastic process to be coded, and then to relate the compression a given 
coding achieves to known properties of the unit interval, properties connected 
with Hausdorff dimension and the Shannon-McMillan theorem. This leads to 
results which in certain ways are sharper than the ones previously obtained. 

Let 2 = (0, 1] and let @ consist of the Borel subsets of 2. With each w we 
associate its nonterminating base s expansion: w = n=<1 In(w)/s”", where 
an(w) = 0,1, --- , s — 1. Then each z, is a measurable function on Q. If visa 
probability measure on ® then {2 , 22, ---} becomes a stochastic process. 
Moreover, any stochastic process with state space (or alphabet ) 


o¢ = {0,1,---,8#—]j 


can be represented in this form, provided it is atomless. More precisely, let 
{p(a,, °-*,@n)} be any consistent set of finite-dimensional distributions with 
the property that 


lima. P(@1, °°° , Gn) = O 


for any sequence (a; , a2, ---) of elements of c. Then there exists a measure yu 
on ® such that 


piwia(wo) =~ a,k = 1,---,n} = pla, --- , Gn). 


Clearly » will be atomless, or continuous. We will be concerned with such atom- 
less measures » under which the process {xz,} is stationary and ergodic, that is, 
with measures yu such that if 7 is defined by Tw = [sw] then 7 preserves u and 
is ergodic under yu. This representation of a process has been used for other 
purposes by Harris [7]. 

For the purposes of this paper a code is a continuous, nondecreasing function 
¢ on [0, 1] with ¢(0) = 0 and ¢(1) = 1. With each w we associate the nontermi- 
nating base s expansion of ¢(w):¢(w) = pe yn(w)/s", where 


y.(@) = 0,1, ++: ,@— 1. 


Thus ¢ is a scheme for associating with each sequence x = (2, 2%, -°:-) of 


2% 


symbols from o another such sequence y = (y, y2, °*:). (For simplicity of 


Received May 17, 1960. 

1 Research carried out at the Statistical Research Center, University of Chicago, under 
partial sponsorship of the Statistics Branch, Office of Naval Research. Reproduction in 
whole or in part is permitted for any purpose of the United States Government. 


594 





A NOISELESS CHANNEL CODING THEOREM 595 


notation we consider only codes with the same input and output alphabets.) 
The code ¢ has the desirable property that for any atomless probability measure 
on ®, there is probability one that in order to determine the first n elements of 
y one need only know a finite number of elements of x. A second desirable prop- 
erty would be that x can be uniquely recovered from y. We will, for each prob- 
ability measure on @, produce a code for which this recoverability condition 
holds, with probability one, and which is optimal, in a certain sense, among all 
codes, even those not having this property. 

Note that if ¢ is simply assumed to be a mapping from [0, 1] to (0, 1] such 
that for all x and y, x is uniquely recoverable from y and the first n elements of 
y are determined by some finite number of elements of x, then it follows that ¢ 
is continuous and either increasing or decreasing. The definition above consti- 
tutes a slight weakening of these requirements—there is no real loss of generality 
in excluding the decreasing case and requiring ¢(0) = 0,¢(1) = 1. 

The efficiency of a code is measured by the amount it compresses a sequence 
(a1, °°*,@,). For any w ¢Q and n 2 1, let 


Uun(w) = {w'i:ar(w’) = 


Thus u,(w) is that s-adic interval of rank n, that is, that interval of the form 
(l/s", (lL + 1)/s"|, which contains w. Now if the first n symbols of the expansion 
of w are known then u = u,(w) is known, and it is known that ¢(w) ¢ ¢(u). 
If (6(~)) denotes the smallest s-adic interval containing ¢(u), then the number 
of symbols in the expansion of ¢(w) which can be determined at this stage is 


exactly the rank of the s-adic interval (¢(u)). But the rank of (@(u)) is clearly 
—lg, \(@(u)), where \ denotes Lebesgue measure. Thus the first n symbols in 
the expansion of w determine exactly —lg, \(¢(un(w) )) symbols in the expansion 
of ¢(w). Therefore the compression effected by the code ¢ on the first n symbols 
in the expansion of w is 


(1.1) C,(w) = —n” lg, A(o(un(w) )). 


To simplify the mathematics we will, in the first part of the paper, replace 
C,(w) by 


(132) D,(w) = —n Ig. A(b(tn(w))). 


(See the remarks at the end of the following paragraph. ) 
A code is efficient if C,,(w) is small in some asymptotic sense. Let 


Cs(w) = limp. Cr(w), 
if this limit exists, and let 
Co(w) = lim inf,... C,(w). 


Define D,(w) and Dj(w) similarly in terms of D,(w). Suppose we are given a 
stationary, atomless, ergodic probability measure yu. If F(a) = u(0, aj, that is 
if F is the distribution function corresponding to u, then F is a code. It is shown 





596 PATRICK BILLINGSLEY 


in Section 2 that for this code we have 
(1.3) piw:Dr(w) = h} = 1, 


where h is the relative entropy of {x,} under uw. (The relative entropy is the 
entropy divided by lg s; see [8].) It is shown that in a special case F reduces to 
Fano coding. In Section 3 it is shown that if ¢ is any code then 


(1.4) uiw:De(w) < h} = 0. 


Thus F achieves maximum efficiency. The methods used to establish (1.4) are 
those of Hausdorff dimension theory [1, 2]. In Section 4 we investigate the ex- 
tent to which it is possible to replace D,(w) by C,(w) in these results. In par- 
ticular, it is shown that (1.3) still holds if Dr(w) is replaced by Cr(w). 

In [9] Kinney has exhibited a relation between Hausdorff dimension and the 
capacity of a noiseless channel in which the letters are of different durations. In 
this paper the letters are assumed all to have the same duration, so that the 
channel has capacity lg s. 

For treatments of the noiseless coding theorem from other points of view see 
Feinstein [4] and Khinchine [8]. 


2. The Code F. As in Section 1, let F(a) = u(0, a]. Then, as a code, F has the 
desirable property that the set of w such that F(w) = F(w’) for some w’ ¥ w, 
has u-measure 0. In other words the original sequence can be recovered from the 
encoded sequence, with probability one. 

THEOREM 2.1. If u is atomless, stationary and ergodic, then 


(2.1) piw:Dr(w) = h} = 1 


, 


where h is the relative entropy of {xn} under wu. 

Proor. Since F(a) = u(0, a] and uw is atomless, \(F(u)) = u(u) for any 
interval u. (This is just a paraphrase of the assertion that if X is a random 
variable with continuous distribution function H(x), then H(X) is a random 
variable which is uniformly distributed on the unit interval.) Therefore 


D,(w) = —n Ig, w(un(w)). 


And now (2.1) follows immediately from Breiman’s version of the Shannon- 
MeMillan theorem [3]. 

Note that the coded process, defined by F(w) = > -%_1 ya(w)/s”, is inde- 
pendent and satisfies u{w:y,(w) = 7} = 1/s. From this it follows that F does 
not commute with the shift Tw = [sw], that is, F( Tw) and TF (w) are in general 
distinct. For otherwise the processes {z,} and {y,} would be conjugate (see [6}), 
which they are not (unless F(w) = w). 

Note also that since D,.(w) converges to hin I;(n), we have foD,(w)u(dw) —h, 
so that the average compression is also h in the limit. 


3. The General Code. We now show that the code F of the preceding section 
is optimal in the sense that no code @ has a compression ratio smaller than h. 
Specifically, we have the following result. 





A NOISELESS CHANNEL CODING THEOREM 597 


THEOREM 3.1. Jf u is atomless, stationary and ergodic, and if @ is any code, then 
(3.1) ulw:De(w) < h} = 0, 


where h is the relative entropy of {xn} under wu. 

Proor. Let v be the probability measure on @ with ¢ as its distribution func- 
tion. Since ¢ is continuous, v has no atoms and for any interval u, »v(w) = A(¢(u)). 
Therefore 


\ 


{—Hig alua(oe))} 


_ Ig, »(un(w)) 
Ig. u(Un(w)) 


Since the second factor on the right goes to h almost everywhere, we have 


D,(w) = -* lg, v(up(w)) 


Dz(w) = him ing 8 *\%()) 
n>e lg #(Un(w)) 
except on a set of u-measure 0. Therefore, in order to prove (3.1), it suffices to 
show that if 6 < A then 


(3.2) p< w:lim inf Ig »(un(w)) s = (0 
| nese Ig w(Un(w)) ~ A 

We prove (3.2) by using results from the theory of Hausdorff dimension. For 
any set M C Q and probability measure » on @ we define the dimension dim, M 
of M relative to u in the following way. Consider a sum >; u(v;)*, where a > 0 
and {v;} is a collection of s-adic intervals covering M (that is, M Cc U.,), with 
u(v;) < p for each 7. The infimum of such sums we denote by L,(M, a, p). As 
p decreases to 0, L,(.M, a, p) increases to a limit L,(M, a), and dim, M is de- 
fined by 


dim, M = sup {a:L,(M,a) = ~} = inf {a:L,(M, a) = 0}. 


See [1, 2] for the details. If u is Lebesgue measure, then dim, M is the classical 
Hausdorff dimension of M (see [1, Section 3}). 

The result relevant to coding theory is the following one (Theorem 2.1 of 
{2]). 

If u and v are probability measures on @, and if u is atomless, then 


lg v(up(w)) 5 


(3.3) dim, < w: lim inf > S65. 


neo Ig w(Un(w)) — 


Applying this result with 6 = 6/h we have 


lg v(u,(w)) 6 
de 


dim, > . leaded) < 
if 6 < h. But any set of positive u-measure has y-dimension 1. This proves (3.2) 
and the theorem. 
It is possible to prove (3.2) without explicitly introducing the notion of Haus- 
dorff dimension. This is somewhat unsatisfactory since it removes the arguments 
used to establish (3.3) from their natural context. In any case, the proof goes as 





598 PATRICK BILLINGSLEY 


follows. Take 6/h = 1 — e and let A be the set of w for which v(u,(w)) = 
u(u,(w))* * for infinitely many n. Since the set in brackets in (3.2) is contained 
in A, it suffices to show that u(A) = 0. Let p be an arbitrarily small positive 
number and let U be the set of those s-adic intervals u,(w), with w e¢ A, for 
which p(u,(w)) < pand v(un(w)) = w(un(w))'*. From the definition of A 
and the fact that u is atomless it follows that the elements of U cover A. Let UV 
consist of those elements of U which are not subsets of other elements of U. 
Then the collection U of s-adic intervals cover A and is disjoint. Moreover 
u(v) < pand v(v) = w(v)*‘ for any v € VU. Therefore 
1=> >) ov) = Dow(v)* = oS Do w(v) = p ‘u(A). 
ve veU ve 

Thus u(A) S pp‘ for any p > 0, and it follows that u(A) = 0. The point is that 
if w lies in A then the function ¢ is increasing very rapidly at w, and if u(A) 
were positive, the function ¢ could not remain bounded. 

In Section 1 we made the assumption that ¢(0) = 0 and ¢(1) = 1. If we only 
assume 0 S ¢(0) S ¢(1) S 1, we can define v by v(0, a] = ¢(a) — o(0). Then 
v will be a finite measure, though not a probability measure, and the above 
argument still goes through. 

Since Dg(w) 2 h except on a set of u-measure 0, an application of Fatou’s 
lemma yields 


(3.4) lim inf ; D,(w)p(dw) = h. 

now 
Thus A is the minimal average compression as well as the minimum in the sense 
of (3.1). With somewhat different definitions and assumptions, Khinchine has 
proved (3.4) with the limit inferior replaced by a limit superior (see pp. 23 ff. 
of [8]). 

As an example suppose that s = 4 and that under yu the process {z,} is inde- 
pendent with yu{w:z,(w) = 1} = p;, where po = 3, pi: = 3, and pe = pz = f. 
Fano coding (see [10]) proceeds in the following manner. Each symbol z, in 
the x-sequence is replaced by a set of 0’s and 1’s according to the following rule. 


0 
10 
110 
111. 


Thus (2, 22, --+) is replaced by a sequence of binary digits. These digits are 
then grouped in two’s and put in base four again by the rule 


00 — 0 
ol — 1 


10 — 


11 





A NOISELESS CHANNEL CODING THEOREM 599 


For example, (0, 1, 3, 2,0, ---) becomes (1, 1, 3, 3, 0, ---). If, for each w, the 
sequence (2:(w), %2(w), ---) is transformed in this fashion into a sequence 
(jy, Ye, °°*), and if we define ¢(w) = +} p B y,/4”, then it is not difficult to 
show that ¢ is just the distribution function corresponding to the measure yu 
defined above. Therefore, by Theorems 2.1 and 3.1, the code ¢ is optimal and 
achieves a compression of h = {, as is well known. 

In the preceding example we showed that a code was optimal for a process by 
observing that, viewed as a function on [0, 1], it is the distribution function 
corresponding to the process. As a second example, we construct an optimal 
code by starting from the distribution function. Suppose that s = 3 and that 
{zn} is independent with uiw:z,(w) = i} = pi, po = po = 4, pi: = O. In this 
case the corresponding distribution function is just the Cantor function. There- 
fore the optimal coding rule is to replace each 2 in the z-sequence by a 1 and to 
convert the resulting sequence of 0’s and 1’s, viewed as a binary expansion, to 
base 3. The resulting compression ratio, lg 2/lg 3, is just that achieved by con- 
verting from base 2 to base 3. 


4. Replacement of D, by C,. In Section 2 and Section 3 we used D,(w), as 
defined by (1.2), instead of C,(w), as defined by (1.1), to simplify the mathe- 


matics. Now C,(w) and D,(w) have the same asymptotic properties for any w 
for which 


noo Igs A(P(Un(w))) 


Fix w and let y = $(w), (y — en, Yy + bn] = O(Un(w) ), and let v, be the smallest 
s-adic interval containing (y — «., y + 6,]. We will first determine conditions 
on y, in terms of its nonterminating base s expansion y = ).*_; y,/s", which 
ensure that 


(4.1) Lim 18+ Mo(un(o))) _ 


(4.2) lim 18+ AC) 


—anaso 


For each n, let N, = N»(y) be the length of the run of either 0’s or (s — 1)’s 
following y, in the expansion of y. That is, determine N, by the requirement 


that either 
Ynut = Ynzo2 = *** = Yanan, = O ¥ Ynys 


or else 


Yn4t = Yng2 = °° = Yntn, = 8 — 1 ¥ Yniwysi- 


If yn4: is neither 0 nor s — 1 then N, = 0. 


TueoreM 4.1. Jf «, | 0, 6. | 0, en + 6, > O, and if lim, N.(y)/n = 0 
then (4.2) holds. 


Proor. Since lg, A(v,)/Ige (én + 5.) & 1, it suffices to show that 





PATRICK BILLINGSLEY 


lim inf 1%) _ > 1. 
n+ lg. (€n + 6,) 


The s-adic interval v, can be determined in the following way. Let v(n) be the 
largest integer v such that 


(4.3) yay tale(Se res bY, 

i=l ¢ i=l ¢ é 
That v(m) is finite follows from the assumption that «, + 6, > 0. Now », is 
just the right-hand member of (4.3) with v = v(n). Therefore lg, A(v,) = 
—v(n) and we must prove that 


lim inf —— am) = 
n> —lg, (€n + 6n) 


From the fact that »(n) is maximal it follows that either y — en < > J7°))*" y;/s' 
or else y + 6, > > or" y;/s' + 1/s"”*. Hence, writingg; = s — 1 — yx, 
one or the other of the relations 


a x 


oo! Bet ld, Me Die ell 


i=p(n)+2 i=y(n)+2 


holds. But the right-hand member of each of these two inequalities is not less 

t2+Ny(na2)) rp , fr , , : 
than s 2)! Therefore —lg, (en + 5n) S v(n) + 2 + Nya 42, and it 
suffices to prove that 


vin) 


lim inf =~ 3. 


a) y(n) + 2 + Non)+2 rae 


Since «, + 6, goes to 0, v(m) goes to infinity as n does. Hence it is enough to 
show that 


9) 


lim inf . => 1 
if. > }. 
k+>a k + Ni 


But this follows immediately from the assumption that lim, N;/k = 0. 

There remains the question of the size of the y-set where lim, N,(y)/n = 0. 

THEOREM 4.2. The set of y in the unit interval for which limy.. Nn»(y)/n = 0 
has Lebesgue measure 1. 

Proor. Since My:N,(y) = ne} = 28 '"*', it follows from the Borel-Cantelli 
lemma that \{y:N,(y) = nei.o.} = 0. From this the theorem follows immedi- 
ately. (It is possible to prove the stronger result that N,(y) = O(lg n) except 
on a set of Lebesgue measure 0. See problem 5, p. 197 of [5].) 

It follows immediately from Theorems 4.1 and 4.2 that we can replace Dr(w) 
by Cr(w) in Theorem 2.1. In fact, if U is the set of y for which lim, N,(y)/n = 0 
then A(U) = 1. But F(w) ¢ U if and only if w ¢ FU, and u(F 'U) = (U) = 1. 
Therefore (4.1) holds except for w in a set of u-measure 0. 

Similarly we can replace Dj(w) by C3(w) in Theorem 3.1, provided 
u(@ U) = 1. General conditions under which this holds seem difficult to obtain. 











A NOISELESS CHANNEL CODING THEOREM 


REFERENCES 

PaTRicK BILLINGSLEY, ‘‘Hausdorff dimension in probability theory,’’ Jll. J. Math. 
Vol. 4 (1960), pp. 187-209. 

Patrick BILLINGSLEY, ‘‘Hausdorff dimension in probability theory II,”’ Jll. J. Math., 
to appear. 

Leo Breiman, ‘‘The individual ergodic theorem of information theory,’’ Ann. Math. 
Stat., Vol. 28 (1957), pp. 809-811; ‘‘Correction note,’’ Ann. Math. Stat., Vol. 31 
(1960), pp. 809-810. 

AMIEL FeEInst1N, Foundations of Information Theory, McGraw-Hill, New York, 1958. 

WILLIAM FELLER, An Introduction to Probability Theory and its Applications, 2nd. ed., 
John Wiley and Sons, New York, 1957. 

Pau. R. Houmos, Entropy in Ergodic Theory, mimeographed notes, The University 
of Chicago, 1959. 

T. E. Harris, ‘‘On chains of infinite order,’’ Pacific J. Math., Vol. 5 (1955), pp. 707-724. 

A. I. Kuincuine, Mathematical Foundations of Information Theory, Dover Pub., 
New York, 1957. 

J. R. Kinney, “Singular functions associated with Markov chains,’’ Proc. Amer. 
Math. Soc., Vol. 9 (1958), pp. 603-608. 


C. SHannon, ‘A mathematical theory of communication,’’ Bell System Tech. J., 
Vol. 27 (1948), pp. 379-423. 





NOTES 


THE ESSENTIAL COMPLETENESS OF THE CLASS OF GENERALIZED 
SEQUENTIAL PROBABILITY RATIO TESTS! 


By M. H. DeGroor 
Carnegie Institute of Technology 


1. Introduction and summary. Consider a sequential decision problem in which 
independent observations are to be taken on a random variable X whose dis- 
tribution is of the form 


(1) dG,(x) = w(0)e” du(zx), 


where the parameter @ lies in a given interval Q of the real line but is otherwise 
unknown, and the measure u(2) is either absolutely continuous or discrete. The 
problem is to decide between the hypotheses 


A: i] < 6* 
(2) 
H2: 60> 6, 


where 6* is a given point of . 

Under the assumptions stated in Section 2 the class A of generalized sequential 
probability ratio tests is shown to be essentially complete relative to the class D 
of decision functions with bounded risk. A decision function 6 belongs to the 
class A if and only if after taking n observations, 

(i) 6 depends on the observations only through n and v, = >024 X; ; 

(ii) 6 specifies a closed interval J, : [ain , den] for each n and the following rule 
of action: 

(a) Stop experimentation as soon as v, ¢ Jn. If vn < ay, accept H,. If 
Un > den, accept He. 

(b) If din < Un < Qo, , take another observation. 

(c) If ayn < de, and v, = aj, accept H; or take another observation or 
randomize between these two (7 = 1, 2). 

The problem considered here is the same as that treated by Sobel [1], and the 
foregoing statement of the problem and conclusion, as well as the assumptions 
to be given in the next section, follow his work very closely. The contribution 
of this paper is that the requirement of bounded loss functions made by Sobel is 
removed. 


2. Assumptions. Let W(6, j), 7 = 1, 2, denote the loss incurred in accepting 
H; when @ is the value of the parameter. It is assumed that there exist values 


Received August 12, 1959. 


1 This research was supported in part by the National Science Foundation under grant 
NSF-G9662. 


602 





COMPLETENESS OF GENERALIZED 8.P.R.T.’S 


<= 6* < 6 such that 
W(6,1) =0 for 6S & 
W(0,1) >0 for 06> & 
W(0,2) =0 for 62 6 
W(0,2) >0 for @< 4G. 


Thus, the zone of indifference may have positive length or may contain only 
the point 6*. It is assumed that W(@, 1) and W(@, 2) are finite for all @ and that 
W (6, 1) is a nondecreasing and W (6, 2) a non-increasing function on Q. 

Let C(n) denote the cost of taking n observations. It is assumed that C(0) = 0 
and that for n 2 1, 


(4) C(n) =a t--- +e, 


where c, is the cost of taking the nth observation. It is also assumed that for 
some positive K, 


(5) lim infc, = K. 


nro 


Let S denote the smallest interval containing all possible values of ( >~7.; 2;)/n 
for alln,n = 1,2---. Define 


(6) gi(0) = y(@)e” 
for allte S. 


It is assumed that corresponding to each te S, there exists 6(t) such that 
g:(@) is strictly increasing for @ < 6 and strictly decreasing for @ = 6. It is also 
assumed that for every « > 0 there exists te S such that 0* < 6(t) < * +. 
(This assumption is needed in Corollary 5 of [1].) It is readily checked that these 
assumptions are satisfied for the family of normal, binomial, or Poisson dis- 
tributions. 

The risk function r(@, 5) of a decision function 6 is 


r(0,6) = W(6,1) Pr (Accepting H, | @, 5) 
+W (6, 2) Pr (Accepting H, | 8, 5) 
+E(C(N) | 4, 4), 
where N is the total number of observations taken. 


3. Proof of the essential completeness. It will now be shown that under the 
assumptions of Section 2, the class A of generalized sequential probability ratio 
tests is essentially complete relative to the class D of decision functions with 
bounded risk. Sobel [1] proved this theorem under the additional assumption that 
both W(@, 1) and W(@, 2) are bounded functions on Q. The proof to be given here 
leans heavily on this result. 

Let Q be the interval from @ to 6. If @ and @ are finite and included in Q, then 





604 M. H. DEGROOT 


it follows from the assumptions of Section 2 that W(@,1) and W(@, 2) are 
bounded by W( 6, 1) and W(@, 2), respectively, and the desired result follows 
from Sobel’s theorem. Hence, the proof will be given here for the situation where 
Q is the open interval @ < 6 < 6 with either end point possibly infinite. The modi- 
fications necessary when © is a half-open interval will be obvious. 

Consider the sequence of problems P® i= 1, 2,--- , defined by the loss 
functions 


W‘’(@,1) = W(@, 1) for @< 6,” 
W‘’(0,1) = W(0:",1) for @> 0s” 
W'?(0,2) = W(8@, 2) for @= 6,” 
Ww‘? (0,2) = W(6;” 


(1) 


,2) for @ 6; 


? 
where 


(1) 


> 6 


(2) 


> & > &’ >.«-- 


(9) (2) 


<< ec Oc E& <.--- 


and 


(10) lim 6;” = 9, lim 03” 
10 to 

Thus, the functions W‘’(@, 1) and W‘"’(@, 2) are bounded for each i, 
i = 1, 2,---,. The sampling cost function C(n) for each problem P"’ is as- 
sumed to be the same as that for the original problem. 

Let 5 be any decision function for the original problem, with risk r(@, 6). 
Then, for each 7, 6 is also a decision function for the problem P“’, with risk 
r‘ (0, 6). Furthermore, for each 6 ¢ Q, 


(11) r (6,6) < r?(6,5) < 


(12) r’"’(0,8) S r(@, 8) 
and 


(13) lim r“ (6, 8) = r(@, 8). 
toa 
A stronger statement than (13) can be made; namely, for each @ ¢ Q there exists 
an integer k, such that r‘’’(0,6) = r(6,8) fori = k,. 
Now let 5 ¢ D be any decision function with bounded risk r(@, 5). By Sobel’s 
theorem, for each 7(7 = 1,2, --- ) there exists 6; ¢ A such that 


(14) r?(6,8;) s r(0,5) forall 6€Q. 


It follows from Wald, [2], Theorem 3.2, and Sobel, [1], Theorem 1 and its proof, 
that there exists a subsequence {6,;;} of the sequence of decision functions {6,} 
that converges in the regular sense (see [1], p. 321, for definition) to a decision 





PROBLEM IN SURVIVAL 


function 6,, and that 


(15) lim inf r“” (8, 6:;;) = r” (0, 50) 


Ie 


for each k(k = 1, 2,--- ) and all @¢Q. Furthermore, the decision function 6, 
can be taken to be in the class A. It will be shown that 


(16) r(0, 5.) S r(@,5) forall @eQ. 


Choose and fix @¢Q. It follows from (13) that (16) will be proven if it can 
be shown that 


(17) (0, 5.) < r(0,5) for 


Accordingly, let k be any positive integer and let « > 0 be an arbitrary positive 
number. By (15), an integer 7 can be chosen large enough so that i 2 k and 


(18) r (6, 8) > (0, 5.) — «. 
Hence, from (18), (11), (14), and (12), 


(k) (k) 


r” (0,80) —¢ < r’(0,8;) S r(0,45;) S r (0,5) S r(8, 8). 


Since ¢ was arbitrary, r“” (6, 6.) < r(@, 5). This completes the proof. 


REFERENCES 


[1] MiiTon SosBs i, ‘‘An essentially complete class of decision functions for certain standard 
sequential problems,’’ Ann. Math. Stat., Vol. 24(1953), pp. 319-337. 
[2] ABranAM WALD, Statistical Decision Functions, John Wiley and Sons, New York, 1950. 


I 


A PROBLEM IN SURVIVAL! 
By James B. MacQuEEN 


University of California, Los Angeles 


1. Introduction. Suppose that at a given time an individual has certain re- 
sources. These are used up at a specified rate, but from time to time 
“opportunities” arrive; at an opportunity a decision is made and the resources 
are changed—increased or decreased—in a random manner depending on the de- 
cision. If the resources ever fall to zero, the individual ‘‘perishes.’’ The problem 
is to make the decision at each opportunity which will minimize the probability 
of ultimately perishing. 


Received September 19, 1960. 

1 The form of the problem considered here was suggested in subscantial part by work 
of Lester E. Dubins and Leonard J. Savage on optimal gambling, presented by Dubins 
at the departmental seminar, Department of Statistics, University of California, Berkeley, 
in October, 1959. 

This research was supported by a grant from the Western Management Science Institute, 
University of California, Los Angeles. 





606 JAMES B. MACQUEEN 


A model for approximating certain situations of this type is described below 
and the optimal policy is established. The problem is well suited for treatment 
by means of the “principle of optimality” [1], and an elegant theorem due to 
Blackwell [2]. 


2. The Model. Let x(t) be the resources of the individual at time ¢. Should 
a(t) = O for the first time at ¢*, let x(t) = 0 for t = ¢*. Let the rate at which 
resources are used be a constant which without loss of generality may be taken 
to be unity. Thus in time ¢ the resources will be reduced by an amount ¢. Op- 
portunities are distributed in time in accordance with a Poisson process with 
constant intensity which may be taken to be unity, again without loss of gen- 
erality, so that on the average one opportunity arrives per unit time. Each such 
opportunity is characterized by a family of distributions which depends on the 
resources available when the opportunity arrives. A decision consists of selecting 
a distribution from this family. Then a number w is randomly chosen according 
to this distribution and the resources are changed instantaneously by that amount 
so that if an opportunity arrives at ¢ and w is chosen, x(t+) = x(t) + w. The 
family of distributions available if an opportunity arrives when the resources 
are x, F,, consists of all distributions {F(w; x)} on the interval [—z, ©) with 
fixed positive expectation ». Thus the individual cannot lose more than all his 
resources at the time the opportunity arrives. Since the individual perishes with 
probability 1 if » < 1, it will be assumed that » > 1. 

A policy is a rule R specifying for every x the choice of a single distribution 
F(w; x, R) eS, to be used should an opportunity arrive when the amount of 
resources is x. (Clearly the policy does not depend on time. ) 

It will be shown that the policy which is optimal in the sense that it mini- 
mizes the probability of perishing consists of always choosing the distribution 
with zero variance; i.e., the individual always prefers a sure thing among the 
class of risks with equal expectations. 


3. The Optimal Policy. Let & be the time at which the problem starts and let 
t;,72 = 1, 2, --- , be the time at which the 7th opportunity arrives. Let 2» be 
the capital at t and let x; be the resources at ¢t; after they are changed by the 
outcome of the venture at that time. In case x(t) = 0 at some time ¢* < ¢;, 
then xz; = 0, 24, = 0, --- . For convenience define Fo(y;z,R) = F(y — z;2, R) 
and let Fo(y; 0, R) = 1 for y = O. Then, for a fixed policy R, the sequence 
%,%,,°°:* forms a Markov process on the state space [0, ©) with constant 
transition distribution, 


: forz = y 
< e°+ [ oly —t,R)e‘d 
. forz > 0,y 2 0. 
Let S” be the event that z; > 0 fori = 0,1, ---,n. Let F” be the event that 
\= 


(1) Gesly) = Pr {ru S y|z; = 2, R} = 


x; = 0 for at least one i S n. Let P(F" | 2, 1 — P(S" | xz, R) be the prob- 





PROBLEM IN SURVIVAL 


ability of F" when x) = x and policy R is used. Let 
(2) p(x, R) = lim,.. P(F” | z, R). 
The principle of optimality provides a necessary condition for a policy R* to 


minimize p(x, R): 


( p@ 
o(z, R*) = ming if v(y, R*) aGnaly)} 
(3) 


ming 16 “p00, R*) + I [ p(y, R*) dFy(y; x — t, Re‘ ats : 
) 


This relation is satisfied by 
(4) p(z, R*) =e ™, 


where a is the positive solution to the equation 1 — a = e ™. A policy R* cor- 
responding to (4) is given by 


1 for w = up, 


e i y* * = Ff ) pe os 
(5) F(w; x, R ) Fo(u + 2; 2, R) ea 


A proof that policy (5) yields (4) can be obtained from Kendall [3]. 
For equation (4) to be a solution to (3) requires 


(6) p(x, R*) = I p(y, R* dGg-2(y) 


and 
(7) p(x, R*) < | p(y, R*) dGz,.(y). 
0 


Equation (6) is the less informative condition on p(z, R*) since it must hold 
for any probability that depends only on the state of the process. However, if 
(6) fails, certainly p(2, R*) is not the desired function. From (4) and (5) the 
right side of (6) is 


a a 


and using the condition 1 — a = e ™, this reduces to e~ 
fied. 
Equation (4) will satisfy (7) if, for any distribution Fo with expectation 


m=x-—tt+u. 


*, showing (6) is satis- 


(8) gla) = I e*O—™) GP(y) = 1. 


Unless Fy is degenerate at m, g(a) is convex and has a unique minimum. Dif. 
ferentiation is permissible and shows that the minimum is achieved at a = 0. 
But ¢(0) = 1. In case Fo is degenerate at m, g(a) = 1. 





608 JAMES B. MACQUEEN 


On the other hand, for any distribution Fy) minimizing (8), g(a) = 1 and 
Soe “dF .(y) = e “”, hence the uniqueness theorem for Laplace transforms 
implies Fy is degenerate at m. Consequently, among those policies R for which 
p(x, R) satisfies (3), p(z, R) = e “ only for R = R* given by (5). This means 
that if R* is optimal, it is also unique. 

As it happens, any constant satisfies (3), as well as equation (4), so that it 
is not yet clear that (4) provides an optimal policy. However, by means of the 
previously mentioned theorem of Blackwell, optimality can readily be estab- 
lished. A convenient, special version of this theorem is given below as Theorem 1. 
Theorem 2 provides a condition under which the hypotheses of Theorem 1 are 
satisfied. 

Let x, 2%, -:* be a Markov process with arbitrary state space 2 and transi- 
tion probabilities Pz.2(A) = Pr {an4i1 C A| an = x and policy R is used at xp} 
defined for every x, every set A € Q, and every policy R. Suppose that for every 
R there is a certain class of states 7 for which P,.2(7) = 1 for xz C T. Let F” 
by the event that z; C T for some i S n. Let S” be the event that 
Ze, Mi,» ***> & GC B— TT. Let plz, R) = lim. P(F"\|z, R) = 
lim,o Pr{F"|a2 = xz and R is used at %,%,°-:}. Let pi(z,R”) = 
lim,.. P(F” | 2, R*) = lima. Pr{F" | xo = x and R is used N times, at xo, 21, 

+ ,tw-1, and R* is used thereafter} . 

THEOREM 1. /f (i) p(x, R*) satisfies the equation 


p(2,R*) = min 4 [ p(y, R*) aP,.(y)\ 


and (ii) for an arbitrary policy R, 
limyeo ~i(z, RY) = p(2, R), 


then p(x, R*) S p(2, R);7.e., R* is optimal. 
Proor. Let Pe,2(A) = Pri2;C A | a = xand R is used at r, 21, +--+ , Xi-1}; 
that is, Pe2(A) = fo --- fofa dPrs,_,(4i) dPr,z;_,(tin) ++: dPr2(1). Then 


(9) pil x, R”) cs [ p(y, R*) dP* ,(y). 
But p(y, R*) S fo p(z, R*) dPe,(z) by (i). Using this inequality in (9), 
(10) pla, 8") S| ple, R*) dPey(2) dPhaly) = riz, R™). 

2 42 


Thus 


p(x, R*) = p(x, R°) S p(x, R') S pi(z, RK’) S «++ x(a, R*) --- 


This sequence is non-decreasing and has a limit. By (ii), 


p(x, R*) < limy... pi(z, R”) = p(z, R). 





PROBLEM IN SURVIVAL 609 


THEOREM 2. If there is a monotone sequence of sets in 2, 2, D Q2 D «++ , such 
that (i) limi... 8UPzeo, p(x, R*) = O and (ii) Pe.(T) 2% > Oforxrx CQ — Q 
and for every R, then limnve Pil 2, R”) = p(a, R). 

Proor. Since S"** C SS", 

p(x, R) — p,(z, R”) = limy.. {P(S"*™ | z, RY) — P(S"*” | z, R)} 


(11) = lim,.. {P(S"*” | S*"', 2, RY) — P(S**” | S*", z, R)}P(S"" | z, R) 


=| [ p(y, R) dP*(y) — [ vy, R*) aP*(y) | P(S** | 2, R), 


where P*(A) = Pr{ry C A|S*™", x = a, and R is used at x, 1, *:* , tna}. 
Suppose p(z, R) = 1. Then limy... P(S”™ | z, R) = Oand limy... (2, R”) = 1. 
Suppose p(z, R) = a < 1. Since 


( 


1 1— [ ply, R) aP*(y) $ P(S*"|2,R) =1—a. 


| 
} 


yV—1 . . ° 
and P(S*' | z, R) is non-increasing and tends to 1 — a, 


(12) limy ae [ p(y, R) dP*(y) = 0. 
Q 


It is also true that 


limyse | p(y, R*) aP*(y) = 0. 
Q 


Let a and ¢ be arbitrary positive numbers. By (i) there exists a k, such that for 
k = ka, p(x, R*) S a/2forz C Q& . By (ii), p(z, R) = ye > Oforrx CNQ—Q,. 
From (12) there exists an N, such that for N = N., fa p(y, R) dP*(y) S «. 
Take « = ¥z,a/2. Then 


a 


(13) Ynez 2 [ p(y, R) dP**(y) 2 re | dP**(y) 
- Q-Q,, Q—-Q,, 


and 
(4) f ply, R*) dP) sf aP*(y) +f aP*(y). 
2 2-24, 2 Jay, 


Canceling y,, in (13) and using (14), fo p(y, R*) dP™*(y) < a. This completes 
the proof of Theorem 2. 

To apply Theorems 1 and 2 to the model under consideration, Q = [0, ~), 
T consists of the single point x = 0, and P,»,, is taken to be Gz, defined by (1). 
The sequence of sets 2,, 2, ---, can be a sequence of intervals {[z;, © )}, 
Lian — % 26> 0,7 = 1, 2, ---. Since p(z, R*) = e ™, condition (i) of 
Theorem 2 is satisfied, as is condition (ii), since Pz,,(7) is then at least e “ for 
xz <Q — QQ. Theorem 1 may then be applied and R* defined by (5) is optimal. 





610 D. SLEPIAN 


4. Acknowledgments. The writer is indebted to R. G. Miller, Jr., for a number 
of invaluable comments and suggestions, and for a critical reading of the manu- 
script. 


REFERENCES 
1. RicHarp BELLMAN, Dynamic Programming, Princeton University Press, Princeton, N.J., 
1957. 
2. Davip BLacKwELL, ‘‘On the functional equation of dynamic programing,”’ J. Math. 
Analysis and Applications, Vol. 2 (1961). 
3. Davip G. KEenpDALL, ‘‘Some problems in the theory of dams,’’ J. Roy. Stat. Soc., Ser. B, 
Vol. 19, (1957), pp. 181-233. 


rr 


FIRST PASSAGE TIME FOR A PARTICULAR GAUSSIAN PROCESS 
By D. SLEPIAN 
Bell Telephone Laboratories, Murray Hill, New Jersey 


1. Introduction. Let x(t) be a stationary Gaussian process with Ex(t) = 0 and 
E\x(t)x(t’)] = p(t — t’). Denote by Q.(T | 2) dT the conditional probability 
that for t > 0, x(t) first assumes the value a in the interval T < t Ss T + dT 
given that z(0) = 2. It is well known that the determination of the first pas- 
sage time probability Q.(7' | zo) dT is not an easy matter in general. To the 
author’s knowledge, Q.(7'| 20) is known explicitly for stationary Gaussian 
processes with continuous spectral densities only in the Markovian case p(r) = 
e ''. See [1], [2], [3] and [4]. This note points out that an elementary solution 
exists for the process with covariance 


(1) p(T) 


for. 6 7 Ss 1. 

2. Markoff-Like Property. The determination of the first passage time proba- 
bility density Q.(7T | 20) for the process with covariance (1) follows from a 
peculiar Markoff-like property it possesses which may be described roughly as 
follows. Let 0 < 4 < t < 1 be two instants in the unit interval. Denote the 
open interval (tf, f&) by A and the set (0, 4) U (#,1) by B. Then for the 
process at hand, given the values of z(t) and z(t), events defined on A are 
statistically independent of events defined on B. 

More precisely, we show the following. Let 


i. @ % << are «© ) oe a es Oe ee ee 
Then 


(2, *** , Dina y Thats °° * 5 TE4, Tira, *** 5 Sa | Mey Za) 
(2) 


_ 


= p(x, o2? 5 Dhak » D341 >° °° 5 Ze I Ley U1) P( Lev, coe , 24 1 Me, £1). 


Received August 20, 1960. 





FIRST PASSAGE TIME 611 


Here we have set 2; = x(t;), 7 = 1, 2, --- , nm, and have followed the time- 
honored (but often deplored) practice of denoting the conditional probability 
density of x; given x; by p(2;|.x;). The assumption of separability for the 
process leads from (2) to the statement of conditional independence of events 
defined on A and B. 

Equation (2) can be established readily by a direct calculation. Let 
4 = 41 + In, 25 = Xj — Zn, Jj = 2, 3, ---, n. One easily verifies from 
(1) that Ez; = 0, Ezz; = 0, Ezi = 2[2 — (t, —)], E25 = 2(t; — t;-), 

9 “ Oe, °** Ba) 


¢=1,2,--- ,n,j = 2,3, --- ,n. The Jacobian - has the value 2. 


Git, ***'» Be 


Therefore, 


p(ay,°°*, tn) = 2(2e)"(2(2 — t, + TT f2(¢; — 4.)1° 


-exp — : Es th 5 + 7 (2; — Lj |: 


2|2(2 —t, + t) 2 2(t; — tj-1) 


The factors occurring in (2) are ratios of probability densities each of the form 
(3). Direct substitution results in the verification of (2). 
Let 0 < tt < th < ts < 1.'The process at hand has the curious property that 


P\ He, %q |) U1, X3) = P| t2| M1, X3) p( | M1, : X3) 


if; <4 St + Lorif t& 2 t + 1, but this conditional independence does not 
hold if 4) + 1 < & <t#;4+ 1. 

We note in passing that the process under consideration can also be written 
as x(t) = y(t + 1) — y(t), where y(t) is the Wiener process, and that the 
Markoff-like property just derived can be obtained from known properties of 
the Wiener process. 


3. First Passage Time. The first passage time probability density for this 
process can be derived from an integral relation of the sort used by Siegert [4]. 
The process can pass from a value 2» > a at time ¢ = 0 toa value x7 S aat 
time t = T only if at some time 6, with 0 < @ < T, the process assumes the 
value a for the first time. If then R,(2xr | x9 , 4) dxr is the conditional probability 
that zr S x(7) S x7 + dar given that x(0) = x and given that for ¢t = 0 
the process first assumes the value a for 6 S ¢ S @ + dé, we have 


pi Ir | XH) = | 


“0 


T 


d0Q.(0 | to) Ralar | 2, 9), “ame > Xe. 


If now T < 1, Ral(xr| 2,9) = plxr| x, 2% = a) because of the Markoff-like 
property of x(t) already described. We have then 
7 


(4) W(x7 |X) = [ d6 Q.(0 | xo) p(arr | 20,29 = a), Sar & i. 
Jo 


a relationship in which Q,(@ | 2) is the only quantity not known. 
Equation (3) can be used to determine the conditional densities appearing in 
(4). After substituting for these quantities and cancelling some nonzero factors, 





612 K. L. CHUNG 


one finds 


exp | - | (zx, _— Lo) 
a 2 oT “ \ (ao + a)’ 
“i . 16(2 — 6)’ ex :, 
; (2eT)! = exp [ St 


oo} aa 
; 22(T —6)]}. 
(0 | Xo) 7 [2x pal 6)|3 


Integrate on x7 from — ~ to a to obtain 


ota) 
1e(2 — 0)’ exp | Q,(0 | x) 4. 
exp | etsy va\O | Xo) 3 


T 


a (a—Zq) s(27)4 
‘| edu = [ 
20 0 
Then Q.(7' | x») can be obtained directly by differentiation with respect to 7’. 
A similar derivation can be carried out under the assumption 2» < a. The com- 
bined result is 


1 [ze(1 — 7) — al’) 
| a —alexpy —, 


_2 T(2 — T) 
@),( Xo) => 
dal T | x Ti2e7 2- Ti 


The author has been unable to obtain an expression for Q,(T | x) valid for 
o> i. 


w~@xea, O< T 


REFERENCES 

{1} Darurne, D. A., AND SteGcert, A. J. F., ‘The first passage problem for a continuous 
Markov process,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 624-639. 

|2] Rice, 8. O., “Distribution of the duration of fades in radio transmission,’’ Bell System 
Tech. J., Vol. 37 (1958), pp. 620-623, 630-631. 

(3) SrecerT, ARNOLD J. F., ‘On the zeros of Markoffian random functions,’’ Rand Corpo 
ration Memorandum RM-447, September 5, 1950. 

|4) Stecert, A. J. F., “On the first passage time probability problem,’’ Phys. Rev., Vol. 81 
(1951), pp. 617-623. 


——— 
A NOTE ON THE ERGODIC THEOREM OF INFORMATION THEORY’ 


By K. L. Cuune 


Syracuse University 


The purpose of this note is to extend the result of Breiman [1], [2] to an infinite 
alphabet, or equivalently, the result of Carleson [3] to convergence with proba- 
bility one. 


Let |--- ,%1,2%0,%,°°:} bea stationary stochastic process taking values in 
a countable “alphabet” {a;, 7 = 1, 2, ---}. Let 


, 2 
. . . ' 
Dias , °°? , a.) = Oey = G@,,k = 1,---, nh, 


Received October 22, 1960. 
' This research was supported in part by the Office of Scientific Research of the United 
States Air Force, under Contract No. AF 49 (638) -265. 





ERGODIC INFORMATION THEORY 613 


and write p; = p(a,) for short. Denoting by “lg” the logarithm to the base 2, 
we set 


p(x k _ OBig t_1) 


go = Ig ieee : 
* , U1 , Xo) 


p(x)’ 


P(x- ae » t-1) 


sa | 
go.” = Ig Ji 
p(t-n,°*** , 21,4) 


p(a;)’ 
We have then 


ff | 2 
gx = Elgesr| Xo, ++ , TI 


and 
} ge} = —&{lg p(xo)}. 


Hence |g, ,k = 0,1, 2, ---} isa nonnegative lower semimartingale provided that 
the “entropy”’ is finite: 


(1) H = —8&jlg p(a)} = —> pi lg p; < ow, 

i=l 
Hence by the martingale convergence theorem, g, converges with probability 
one as k — «. To prove the ergodic theorem, namely that with probability one 


(2) lim, +o I ‘Ig p(x, ™ » Ln—1) = —H, 


it is sufficient, following [1], to show that 
(3) ElLSUPosk<ao Jk} < *. 


The inequality (3) implies also that the sequence |g, k = 0, 1, 2, ---} is uni- 
formly integrable, hence its convergence with probability one implies its conver- 
gence in mean (of order one). From this it follows (see [4]) that (2) holds also 
in mean. The last assertion has already been proved by Carleson [3]. We state 
our result as follows. 


THEOREM. (1) implies (3) and consequently (2) both in mean and with proba- 
bility one. 


Proor. Let w denote the sample point and define for each nonnegative integer m 
E,(m) = \wisuposjce gj <M; ge 2 mM}, 
ye) = Geil (st) : (2) > ’ 
Ey’ (m) = {wisupesjce gi <™M;ge 2 mj, 
Zi = {wire = aj. 
We may suppose that the sequence {p;, 7 = 1, 2,---} is nonincreasing since 
this can always be achieved by relabelling the alphabet. Let f(m) = 0 and write 


0 


P| E,(m)} > Pl E.(m) NZ} 


i=1 


> OE(m) NZ) + D> PE(m) NZ}. 


i<f(m) i >f(m) 





614 K. L. CHUNG 


We have, since g 2 mon E,(m), 


P\ Em) NZ} s 2° "0} EE, (m)}; 
«2 oo 


> dX fF (m)nZ,;} <2" dS D> fk (m)} 


k=) is<f(m) isf(m) k=O 


m f(m) 
») . 
rn re a . 
On the other hand, it is plain that 


«© 


(5) > Dd OAE(mMNZs DY Zi = DY pi. 
k=0 i >f(m) >f(m) t>f(m) 


. > ‘ Baa 
Let f-'() be the number of m such that f(m) < i, then f '(7) 
max {m:f(m) < i}. Summing (4) and (5) over all m, we obtain 


(6) = > ef Ex(m)} < se fim) “+ > f '(i)p;. 
m=0 


Om 
m=() k=() ~ t=] 


Now choose f(m) = 2”/(m + 1)°;a simple computation shows that there exist 


two positive constants A and B such that f (i) S Algi + B for alli = 1. 


Since {p,} is nonincreasing, we have ip; S 1 so that 


wm a 


fps > (1 lg ; - B) pi = AH +B. 


jul coal 
Hence we have by (6), 


9 


m=) k= 


‘ > @\E,(m)} s 7 + AH + B. 
) 


Finally, 


x « 0 


&| sup ge} S Dd, Of sup ge = m} = DY DY OLE,(m)}, 


O<k<x m= 0<k<w m=!) k<=() 


which completes the proof that (1) implies (3). 


REFERENCES 

{1} Leo Breiman, ‘‘The individual ergodic theorem of information theory,’’ Ann. Math. 
Stat., Vol. 28 (1957), pp. 809-811. 

|2] Leo Brerman, ‘A correction to ‘the individual ergodic theorem of information theory’,” 
Ann. Math. Stat., Vol. 31 (1960), pp. 809-810. 

|3] L. CarLeson, ‘““Two remarks on the basic theorems of information theory,’’ Math. 
Scand., Vol. 6 (1958), pp. 175-180. 

[4] A. Fernste1n, Foundations of Information Theory, McGraw-Hill, New York, 1958. 





TWO-STATE SEMI-MARKOV PROCESSES 615 


REMARK CONCERNING TWO-STATE SEMI-MARKOV PROCESSES 


By C. DERMan! 
Columbia University 


Let {X,, ¥1, X2, Yo, ---} be a sequence of independent non-negative random 
variables where the X’s have a common distribution function F and the Y’s, a 
common distribution function G. Define 


= 0 


k 
> (X04 F,), 
i=1 


5 = Si + X41 ’ 
and, foreach t,0 <t< « 


Z(t) =1 if S,<t< S forsome k 
= 0 otherwise. 


Z(t) is a two-state Semi-Markov Process (see Lévy [2], Pyke [3] and [5], Smith 
|6|). Such a process arises in work sampling, and also in counter models as treated, 
e.g., by Pyke [4]. 

Let P(t) = P(Z(t) = 1). From a result of Smith ({7], Theorem 1) it follows 
that, if H = F * G (the distribution function of X + Y) is non-lattice, then 


( =x 
lim P(t) = /EX + EY 


t-on2 


if EX < a, EY < « 
\0 EX < @, EY = ~, 
Our remark is directed toward the behavior of P(t) for large t, allowing that both 
EX and EY can be infinite. It consists in the following observation: 


1 — f(s) 
eo 1 — F(s)g(s) ; 


r 
T?x 


be 
(1) _ f P(t) dt = lim 
0 


s> 0, 


if either of the limits exist, where f(s) and g(s) are the Laplace-Stieltjes transforms 
of F and G. The truth of this remark is established by starting from a convolution 
representation of P(t) (see [4]), taking Laplace transforms, and bringing to 
bear the Abelian and Tauberian theorems on p. 182 and p. 192 of [9]. 

If f(s) and g(s) are known, (1) is directly applicable. The following examples 
demonstrate the applicability of (1) for other situations. 

(a) Suppose g(s) = f*(s) for some k = 1, 2, --- . Then the limit in (1) exists 
and is equal to 1/(k + 1). 


Received October 3, 1960; revised November 3, 1960. 
! Work sponsored by the Office of Naval Research under Contract number Nonr 266 (55) 
(Nr 042-099) 





616 C. DERMAN 


(b) Suppose, ast > ~, 


[ (1 — F(x) de ~ A 


. 0<a<sl, A>O0O, 


0 r(2 = a) 


t 
[ (1 — G(x) de ~ ae nt 0<pS1, B> 0. 
0 


ra—s! ’ 

It can be shown, using the Abelian theorem on p. 182 of [9], that the limit in (1) 
is A/(A + B) (if a = B), 1 (if a > B), and 0 (if a < £8), a result also obtain- 
able from {8}. 

The limit (1) could be studied from the point of view of Darling and Kae [1}. 
Possibly, their results would yield conditions on F and G for (1) to hold. 

The behavior of P(t) itself, for large t, does not seem to be ascertainable by 
the method given here. 


REFERENCES 

1] Dariine, D. anp M. Kac, “On occupation times for Markoff processes,’’ Trans. Amer. 
Math. Soc., Vol. 84 (1957), pp. 444-458. 

2) Lévy, P., ‘‘Processus semi-Markoviens,’’ Proc. Int. Congr. Math., Amsterdam, Vol. 3 
(1954), pp. 416-426. 

3] Pyke, Rona.p, ‘Markov renewal processes: definitions and preliminary properties,”’ 
Technical Report under Contract Nonr 266(59) Columbia University, 1958. To 
appear in Ann. Math. Stat. 

Pyke, Rona.p, ‘‘On renewal processes related to type I and type II counter models,” 
Ann. Math. Stat., Vol. 29 (1958), pp. 737-754. 

PyKE, Ronaup, “‘Markov renewal processes with finitely many states,’’ Technical 
Report under Nonr 266(59), Columbia University, 1958. To appear in Ann 
Math. Stat. 

Smitu, W. L. “‘Regenerative stochastic processes,’’ Proc. Royal Soc. London, Ser. A., 
Vol. 232 (1955), pp. 6-31. 

Smita, W. L., ‘“‘Asymptotic renewal theorems,’’ Proc. Roy. Soc. Edinburgh, Ser. A., Vol. 
64, Part I (1954), pp. 9-48. 

8] TaxAcs, L., ‘On a sojourn time problem,”’ J'eor. Veroyatnost. i Primenen, Vol. 3 (1958), 
pp. 61-69. 

\9} Wipper, D. V., The Laplace Transform, Princeton University Press, Princeton, 1946 





I 


AN EXAMPLE OF AN ANCILLARY STATISTIC AND THE COMBINATION 
OF TWO SAMPLES BY BAYES’ THEOREM 


By D. A Sprorr 
University of Waterloo, Ontario 


1. Origin of the example. In [1], an example was given in which a fiducial 
distribution served as a distribution a priori to be combined with a different set 
of data (not capable of yielding probability statements), by Bayes’ Theorem. 
In [2], it was shown that this procedure of combining samples, when each sample 
yielded a fiducial distribution, could lead to a contradiction. In [3], an attempt 


Received November 21, 1960 





BAYES COMBINATION OF TWO STATISTICS 617 


was made to show why these contradictions arise and how to eliminate them. 
Two conditions that all distributions a posteriori must fulfil, were stated. From 
these, the following necessary conditions were derived: the two samples to be 
combined by Bayes’ Theorem must have sufficient statistics following: 

(1) the normal distributions with means 6, cé + k, or 

(2) the gamma distribution with parameters 0, (c@)*, or 

(3) the normal distribution with mean @ and the gamma distribution with 
parameter c exp k@, 
where c and k are known constants. Cases (1) and (2) were also shown to be 
sufficient conditions. It remains to show that case (3) is a sufficient condition 
(i.e., no contradiction arises). 


2. Derivation of an ancillary statistic and the corresponding fiducial distribu- 
tion. Suppose the sufficient statistics 7; and 7: have densities 
L,(T,, 6) dT; = (2xn)* exp [—(7; — n0)?/2n] dT; , 
m—l1 om 


Lo(T2, 02) dT, = [T'c"/T(m)] exp [mké — ce’T.] dT» . 


Thus, the simultaneous distribution of 7; and 7’ is 


[c"Ts '/(2an)'T(m )] exp [mk@ — ce”T, — (T, — n0)*/2n] dT; dT? . 


Making the transformation 
T, = exp |—k(U; + U2)], T, = nU3, 
the simultaneous distribution of U, , U2 is 
[c"nk/(2an)'T(m)] exp [—mk(U,; + U2 — 6) 


k(Uy+U2—8) 
e 


Integrating with respect to U2, the distribution of U, is 
le" nkI(U,)/(2an Tr (m J adU,, 
where 


® 


I(U,) = / “exp [—mk(U, + w) — ce“ — In w'] du, 


and is independent of 6. Hence U; = —T7\/n — (log T:)/k is an ancillary sta- 
tistic. 
The distribution of U2 given U; is 


(1) Li Us U; ; 6) = ‘exp {—mk( U; de U: so 6) — ce k(U, +U9—6) 
— 3n(U2 — 6)*}}/1( Ui). 


Using (1), the corresponding fiducial distribution is given by 


Us 4 
(2) f(@| U,, U2) = | |? L(% Us,0) | au. 
uQg~r—@ 00 





618 D. A. SPROTT 
3. Derivation of distribution a posteriori by Bayes’ Theorem. The fiducial 
distribution based on 7) is 
h(0| 7,) = (n/2xr)' exp [(T; — n6)?/2n]. 


Using this as the distribution @ priori, to be used in conjunction with T2 , gives 
as distribution a posteriort, 


(3) b(6| T,, T2) = {exp [mké — of Ts — (fT, — n¥ 2n}}/I(T,, T2), 

where 1(7,, Tz) = f°. exp [mk@ — ceT, — (T,; — n0)°/2n| dé. Hence 
I(T,, T2) = [exp mk(U, + U2)\I( U4), 

and so 

(4) b(@| Uy, Us) = L(U2| Ui, 4). 

From (1) and (4) it can be seen that 

0 “ 


L(U2| Ui, @) = b 


LA U2| U1 ,6) = a al 


~ b(@| U;, U2), 


and so from (2) f(@| U,, Us) = b(@| U;, Us). Thus, the fiducial distribution 
based on the combined sample is the same as the a posteriori distribution ob- 
tained on combining the samples by Bayes’ Theorem, using the fiducial distribu- 
tion based on one of the samples as a distribution a priori. Thus all three condi- 
tions stated at the first are sufficient as well as necessary. 


REFERENCES 
[1] Ronan A. Fisner, Statistical Methods and Scientific Inference, Edinburgh, Oliver and 
Boyd, 1956. 
{2} D. V. Linney, ‘‘Fiducial distribution and Bayes’ theorem,’’ J. Roy. Stat. Soc., Ser. B, 
Vol. 20 (1958), pp. 102-107. 
{3} D. A. Sprort, ‘‘Necessary conditions for distributions a posteriori,’ J. Roy. Stat. Soc., 
Ser. B, Vol. 22 (1960), pp. 312-318. 





CORRECTION NOTES 


CORRECTIONS TO 
“SADDLEPOINT METHODS FOR THE MULTINOMIAL 
DISTRIBUTIONS” 


By I. J. Goop 


Admiralty Research Laboratory, Teddington, Middlesex 
Corrections to the paper of the above title (Ann. Math. Stat., Vol. 28 (1957), 
pp. 861-81) are stated in another paper appearing in this issue, ‘“The multivariate 
saddlepoint method and chi-squared for the multinomial distribution,’ Vol. 32 
(1961), pp. 535-548. 
I 
CORRECTIONS TO 
“A RELATIONSHIP BETWEEN HODGES’ BIVARIATE SIGN TEST 
AND A NON-PARAMETRIC TEST OF DANIELS” 


By Bruce M. Hii 
Stanford University 


On page 1191, line 3 of this article (Ann. Math. Stat., Vol. 31 (1960), pp. 1190- 
1192), replace Pr{m > m,} by Pr{m S m,}. 


On page 1192, line 15, replace P;(£, 7) by Pi(n/é, & + 7). 

On page 1192, lines 23, 24, and 25 should read “and #; = ¥%, + --- + ¥;, 
with the ¥; independent and ¥; taking on the values +1 and —1 with proba- 
bilities P;(m;/&; , & + n;) and 1 — Pj(nj/t;, & + n;),” respectively. 


LL $§ii— 


CORRECTIONS TO 
“THE THEORY OF PROBABILITY DISTRIBUTIONS OF POINTS 
ON A LATTICE” 


By P. V. Krisuna Iver 
University of Oxford 

On page 206 of the above-titled article (Ann. Math. Stat., Vol. 21 (1950) ) the 
following corrections should be made: 

(a) In the columns for 4 and 8 black points, the number of configurations for 
8, 9 and 10 B-W joins is 117, 108 and 56 respectively instead of 119, 104 and 58; 

(b) In the column for 6 black points, the number of B-W joins is 160, 178 and 
196 instead of 156, 186 and 192 respectively. 

I am grateful to Dr. Stephen G. Brush of the Lawrence Radiation Laboratory, 
Livermore, California for bringing this to my notice. 

619 





CORRECTION NOTES 


CORRECTIONS TO 
“A THEOREM ON FACTORIAL MOMENTS AND ITS APPLICATIONS” 
By P. V. Krisuna Iver 
Defense Science Laboratory, New Delhi 


On page 256 of the above-titled article (Ann. Math. Stat., Vol. 29 (1958) ) the 
following corrections should be made: (a) line 17, insert (5) before [X''X]; (b) 
line 22, in place of + s! (>> TyFt, *** L1,), read + s!s(>_ ti,21 -** L1,). 


Senet 


CORRECTION TO DECEMBER MEETING ANNOUNCEMENT 


The International Symposium on the Transmission and Processing of In- 


formation, which it was announced in the December, 1960 Annals would be 
held at Massachusetts Institute of Technology from September 6-8, 1961, has 
been cancelled. 


em 


CORRECTION TO 
“PROBABILITY CONTENT OF REGIONS UNDER SPHERICAL 
NORMAL DISTRIBUTIONS, II: THE DISTRIBUTION OF THE 
RANGE IN NORMAL SAMPLES” 


By Haroitp RuBEN 
Columbia University 


The title of the above article (Ann. Math. Stat., Vol. 31 (1960), pp. 1113-1121) 
was incorrectly given on both the December cover and in the 1960 Annual 
Index. It should read as stated above. 





ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Eastern Regional Meeting of the Institute, April 21-22 
1961. Additional abstracts appeared in the March, 1961 issue.) 


, 


6. Tables of Minimum Functions for Generating Galois Fields GF (p"). J. D. 
ALANEN, Case Institute of Technology. (Introduced by I. M. Chakravarti.) 


A polynomial f(z) of degree n irreducible in the field GF (p) where p is a prime number, 
is called a minimum function, if a root w of the equation f(z) = 0, serves as a primitive 
element of GF (p"), that is, w? = 1, w, w*, --: , w?"~* are the p" — 1 non-zero elements of 
GF (p"). It is known that for the GF (p"), there are g(p* — 1)/n minimum functions, where 
¢ is the Euler function, p a prime, and n an integer. Minimum functions were very success- 
fully used in the past in constructing sets of mutually orthogonal Latin squares, balanced 
incomplete block designs, confounded and fractional factorial designs. Recently these have 
found a new application in the construction of error-correcting codes. While searching for 
a minimum function of GF (13*), we noticed a lack of comprehensive tables of minimum 
functions in the published literature. A program has been written and all minimum func- 
tions generated for a fairly comprehensive set of values of p and n. 


7. Testing to Establish a High Degree of Safety or Reliability. F. J. ANscomsr, 
Princeton University and Bell Telephone Laboratories. (Invited paper) 


We are concerned with the possibility of establishing the safety of a weapon or the re- 
liability of a component or device by testing a large number of specimens under some stand- 
ard operating conditions and demonstrating that the proportion of failures, p, is very 
small. When possible, a fully economic treatment of such a problem is to be desired, in 
which the expected loss from wrong decisions is assessed and balanced against the cost of 
testing. But sometimes a noneconomic type of requirement must be considered, such as: 
(A) The device will be accepted for service only if the test results permit an assertion with 99% 
confidence that p < 1/2000. Most statisticians will interpret such a requirement, by analogy 
with the definition of a confidence coefficient, as follows: (B) The acceptance rule must be 
such that, for all values of p > 1/2000, the least upper bound to the chance of acceptance = 1%. 
But it is suggested that the following weaker interpretation is sufficiently stringent and 
more appropriate: (C) The device will be accepted only if the test results justify fair betting 
odds of 99:1 that p < 1/2000. These odds of 99:1 are to be the final betting odds of an ob- 
server who before the trial begins is open-minded and unprejudiced. A suitable prior prob- 
ability distribution for p, relating to such an observer, is proposed, and an acceptance 
boundary for sequential testing is obtained. In order to complete the specification of a 
sequential rule of procedure, it is necessary to add a second boundary, for abandoning the 
trial when the cost of continuing seems excessive. Possible ways of doing this are discussed, 
and a boundary based on a detailed economic analysis is developed. Because the acceptance 
requirement (C) is probabilistic (Bayesian), the validity of the acceptance boundary is 
not affected by the introduction of a boundary for abandoning the trial. 


8. Extreme Values in Gaussian Sequences. Simeon BERMAN, Columbia Uni- 
versity. 

The first theorem extends a result of Rényi (1958). Let (Q, @, P) be a probability space 
and {X,} a sequence of independent and identically distributed random variables defined 
on the space. For each n, let Z, = max (X,, --: , Xn); suppose that Z, has a limiting 
distribution. If Q is another probability measure on @ which is absoiutely continuous with 
respect to P, then Z, has the same limiting distribution under Q as under P. An applica 


621 





622 ABSTRACTS 


tion to a nonstationary Gaussian sequence is given. The second theorem concerns a sta- 
tionary Gaussian sequence {X,}. Conditions are given on the covariance sequence which 
are sufficient for the convergence in probability of max (X, , --- , X,) — (2 log n)! to zero. 
The conditions are satisfied by the stationary Gaussian Markov process. 


9a. On the Foundations of Statistical Inference II (Preliminary report). ALLAN 
BrrnBAauM, New York University. (By title) (Abstract printed in the 
December, 1960 issue, p. 1216.) 

9b. Some Theory and Techniques for Robust Estimation (Preliminary report). 
ALLAN BrrnsauM, New York University. 


Let f(x, 0, 7) be the density function of sample point z, depending on real parameter 0 
and any (nuisance) parameter y of specified ranges. For any given estimator @* = 6* (zr) 
of 6, let r(@, y) denote the mean-squared-error (m.s.e.) (or alternatively the variance when 
it is useful to restrict consideration to unbiased estimators). Admissibility of @* (possibly 
in a restricted class of estimators) is defined as usual. 6* is called robust (over the specified 
range of 7) if for each y’ there exists a corresponding estimator, admissible when y = 7’ 
is known, with m.s.e. r(6, y’), such that r(@, y’)/r(@, y’) is near unity for all 6. The degree 
of attainable robustness depends on the form of f(z, 6, y), and its determination is of in- 
terest along with the characterization and construction of robust estimators. It is appro- 
priate here to call an estimator admissibly robust if it is simply admissible. (Problems of 
robust confidence limit estimation and testing, and other formulations of point estimation, 
can ke discussed similarly, taking r as a possibly vector-valued risk function representing 
relevant error-probabilities.) Admissibly robust invariant estimators of a location param 
eter @ are characterized (as having a modified Pitman structure), and problems of comput 
ing r’s and attainable robustness are discussed. Under restriction to unbiased estimators 
linear in ordered observations, admissibly robust estimators are characterized, and fairly 
tractable theoretical and computational methods are illustrated. 


10. Bayes Rules for the Problem of Choosing the Largest Mean (Preliminary 
report). RicHarp P. BLAND, University of North Carolina, ANnp Davip 
B. Duncan, Johns Hopkins University. 


Random samples of equal size m are drawn independently from n normal populations 
with the same variance o”. A Bayes rule is derived for choosing a superior subset of popula 
tion means so as to contain the largest. The loss is the sum of losses for each of the means 
involved, these being: zero for the largest mean if chosen, k,é if not; kod for an inferior mean 
if chosen, and zero if not; where 6 is the difference of the mean concerned from the second 
largest or largest mean respectively and k; > ky > 0. The population means have inde 
pendent identical normal prior densities with variance y’e2/m. Rejection from the superior 
subset depends on the differences of the sample mean concerned from the largest sample 
mean, the second largest and so on, the dependencies rapidly diminishing. The differences 
necessary for rejection depend on the loss-ratio k = k,/ky and the prior variance ratio y? 
and, notably, diminish slightly with n. These are tabled for n = 3, o? known. A conserva 
tive-near-Bayes rule with rejection depending only on the difference from the largest sample 
mean is also presented with tables for all n and o? unknown. 


11. Iterated Steepest Ascent on Ellipsoidal Contours. R. J. Bevnver, B. V. 
SHAH, AND O. Kempruorne, Iowa State University. 


An arbitrary quadratic response in n variables y = co + >. biti + D> D> aijxiz; having 
a unique minimum or maximum may without loss of generality be replaced by y = >> aiz’ 
in which a 2 az = ++: = an > O; a; : being equal to the ith shortest axis of one of the 





ABSTRACTS 623 


family of similar ellipsoidal contours centered at the origin. Starting at an arbitrary point 
P, proceed rectilinearly in the direction of the gradient at P; until a minimum of y is found 
at P.. Iterate to obtain a sequence P, , P; , P;, +--+ of points having responses y , Y2, Ys ,°**° 
converging to zero. The relative success of the mth step can be measured by the smallness 
Of pm = Ymsi/Ym. For n = 2 it is shown that: (i) p: = pz = ps = «++ , so that ym con 
verges in a geometric progression to zero. (ii) The “least favorable”’ starting puints on any 
contour occur where tangents are at 45 degrees to the axes. (iii) At these points one has 
Pmax = (r? — 1)?/(r? + 1)? where r? = a;/a2. For arbitrary n it is shown that: (i) p: S 
p2 S ps S --: so that the rate of convergence never improves. (ii) This rate however is 
never worse than that associated with the “least favorable” starting point, which occurs 
in the two-dimensional subspace containing the longest and shortest axes. (iii) Starting at 
this point pi = pmax = (r? — 1)?/(r? + 1)? where r? = ai/a,. 


12. Steepest Ascent Partan on Ellipsoidal Contours. R. J. Burnuer, B. V. 
SHAH, AND O. KempruorNg, Iowa State University. 


In the method of parallel tangents (PARTAN) the ambiguous directions at P; and P: 
(and in n dimensions at P, , Ps, --+ , Pan-«4) can be resolved by proceeding normal to the 
corresponding tangent planes (steepest ascent). The resulting procedure is invariant under 
translations (x’ = xz + c) and rotations (z’ = Uz, U orthogonal) but not under changes 
of scale (x’ = Az, A diagonal). Thus a general quadratic response may be represented in a 
canonical form y = )>-! a;xj(a; > 0). Let P; be parametrized by squared direction cosines 
= axi/>. a jZ; , using the z; values at P,. Define S; = >, alli = 0,1,---) and 
Am = m X m determinant with (A,)i; = Si,;-2(m = 2, 3, ---). It is shown that the re- 
sponse at Pom_2 iS Yamn-2 = Yidm/Am (m = 2,3,---), where A} is the cofactor of (Am) . 
For any given n, m and a’s the “‘poorest’’ (least fortunate) P; has been determined ex- 
plicitly by locating all maxima of A,,/A}) . Typical results are (i) In n = 3 dimensions if 
any two a’s are equal, y, = A; = 0 for any P,, and (ii) In n dimensions y./y: S 
(r? — 1)4/(r* + 6r? + 1)? for all Pi , a; , where r? = amax/amin , and the bound improves 
if ai * 4 (amin + Omax) for every 7. 


13. Asymptotic Relative Efficiency of Mood’s and Massey’s Test Against Some 
Parametric Alternatives. I. M. CHakravarti, F. C. Leone anv J. D. 
ALANEN, Case Institute of Technology. 


In an earlier paper the expressions for the exact power of Mood’s and Massey’s tests 
have been derived. The numerical values of the power for several sample sizes and selected 
values of the parameters were computed. The asymptotic relative efficiency of Mood’s 
test against normal and rectangular alternatives was derived by Mood and Andrews. In 
the present paper, the asymptotic relative efficiency of Mood’s test against the likelihood 
ratio for the change in location of exponential distribution, is derived. Further, this is 
carried out for all three alternatives for Massey’s test. The asymptotic powers are com 
pared with the exact powers to find out how large a sample size is needed before one could 
use the expressions for the asymptotic power. 


14. Several-Sided Kolmogoroff-Smirnoff Procedures. Hersert T. Davin, Iowa 
State University. (By title) 


Let D, be Kolmogoroff’s statistic and let X'! and X* be the extreme order statistics 
Consider testing H, : F = F, by the non-parametric test based on max(D, , ki-F,(X"), 
ke- (1 — F,(X’))), leading to a ‘‘four-sided”’ acceptance region for the “‘stairway’’ portion 
of the sample CDF. A pertinent distributional fact is as follows. Let the statistic 
S,(zi, +++, %n), F continuous, have structure (d). Let Sa.o(ti , --- , Xn) be the statistic 





624 ABSTRACTS 


S, corresponding to the uniform distribution over [a, 1 — b], a and b 2 0. Let 8 be the 
order in n of the sup, for 2; in [tn™!, 1 — un], of 


|S. o(% » *** 5 Za) — Sin lun 1(21 ee Zn)|. 


Let Pr {Sp < s|F} = ¢n(s), with lim ¢,(sn-*) = ¢(s) continuous. Then, if a + B < 0, 
F(X"), F.(X"), and D, are asymptotically independent when the population CDF is F, . 
Error bounds are computed for assuming independence for finite n. These bounds show that 
independence sets in quite early. For example, for n = 38, Pr {Djs < .2347| F.} = 
Pr {F,(X*) < .99932| F.} = +/.95, whereas 

< 


948 < Pr {Dis < .2347, F.(X*) < .99932 | F.} S 952. 


15. A Generalization of a Simple Test Function for Guarantee Time Associated 
with the Exponential Failure Law. Satya D. Dusery, Proctor & Gamble 
Co. 


In a previous paper entitled, ‘‘A Simple Test Function for Guarantee Time Associated 
with the Exponential Failure Law’’ the author has shown that a test function based on 
the first and the r(Sn, the sample size)th observations for testing the hypothesis on the 
guarantee time has several desirable properties (see the abstract of this paper in the same 
issue of this journal). This has prompted the author to consider a generalized simple test 
function based on any two sample observations for the same hypothesis. For the use of this 
simple test function upper 1, 5, and 10 per cent critical values are tabulated up to the 
sample size 10. Several moment recurrence formulas are established which reveal interest- 
ing relationships between its kth and (k — 1)th order moments. Its power functions are 
derived. The results applicable to the special cases of the generalized simple test function 
are mentioned and the useful properties of this test function in some special cases are 
pointed out. 


16. A Simple Test Function for Guarantee Time Associated with the Exponential 
Failure Law (Preliminary report). Sarya D. Duprey, Proctor and Gamble 

Co. (By title) 
Let the probability density function (p.d.f.) of a time to failure random variable, 7 be 


represented by 
1 6—1(t—@) 


fant 
A , t> Gand@>0 


felt) =< , 
- 10 otherwise . 


Consider the H: G = Gp» against the A: G # G, and assume @ to be unknown. On the basis 
of the first r(sSn, the sample size) ordered observations we have derived the likelihood 
ratio test of the above hypothesis. Under H this test has an F distribution with 2 and 2r — 2 
degrees of freedom. It is completely unbiased, has a monotone power and is a uniformly most 
powerful unbiased test. The results cover the work of Paulson (Ann. Math. Stat., Vol. 12 
(1941) pp. 301-306). For the same hypothesis an alternative test function based on the first 
and rth sample observations is suggested. This test function is completely unbiased and 
has a monotone power. Furthermore, its power for G < G» , which is of interest in life test- 
ing situations, is exactly the same as of the likelihood ratio test and is independent of r. 
This immediately suggests taking r = 2 and using only the first two ordered observations 
of the sample. The results go beyond the work of Carlson (Skandinavisk Aktuarietidskrift, 
Haft 1-2 (1958), pp. 47-54). 


17. Asymptotically Most Efficient Single Observation Estimator of Expected 
Life for Exponential Failure Law (Preliminary report). Sarya D. Dusey, 
Proctor and Gamble Co. (By title) 


For the exponential failure law with known location parameter, the minimum variance 
single observation unbiased estimator of the scale parameter is investigated. It is found 





ABSTRACTS 625 


that if the r(Sn, the sample size)th observation in order of increasing time is the single 
observation on which this estimate is based and if we write r = né, , then lim, 5, = 5» 
which is the positive real root of the equation, log (1 — 6) + 26 = 0. It is about 66 per 
cent efficient in comparison with the minimum variance unbiased estimator based on all 
the n observations in the sample. The sample median has only 48 per cent efficiency. The 
smallest sample observation is 100/n per cent efficient and the largest sample observation 
has asymptotic efficiency of (600/2*)[ (log? n)/n]. The percentile estimator approach also 


yields the same results; however, the present investigation has yielded some interesting 
side results. 


18. Central Limit Theorem for Sums Over Sets of Random Variables. frizpHELM 
E1ckEr, University of North Carolina. 


Many estimates are given as linear combinations ¢, = Dre annex Of independent errors 
ex whose distributions are unknown and non-identical. Some of the asymptotic properties 
of these estimates are governed by the central limit theorem (CLT). The question is raised 
as to what restrictions must be placed upon the set F of random variables (r.v.) from which 
the e’s are taken in any arbitrary order, and also upon the matrix of constants in order 
that the CLT holds ‘‘over F.’’ A sequence of functions like {f,} is said to converge over 
(or: on)F (in any sense) if this is true for any possible choice of a sequence « , e , --+ with 
elements ¢; in F. The following theorem is derived: Let F be a set of r.v. with He = 0,0 < 
Eé < «. Then it is necessary and sufficient for the convergence of ¢, on F to the normal 
law N (0, var ¢,) forn — & that simultaneously holds: 


2 2 
(1) maxeet.---.n @nt/ Dor Gur 7 0 for n > &, 


(11) There exists a bounded function g(c) for c S 0 with lim... g(c) = 0 such that for 
each r.v. in F f\e)>ce* dG(e) < g(c) holds where G(e) is the distribution function of e. 


(III) var « > m > O for all r.v. in F. 


Application can be made to the least squares estimates in linear regression and in autore- 
gressive schemes of time series. Of particular importance is the bearing upon some non- 
parametric problems and on some nonstationary time series. 


19. Power of a Non-Parametric Test of Independence. Recina C. ELAnpr, 
Case Institute of Technology. (Introduced by N. L. Johnson.) 


In a previous paper [R. C. Elandt, Zastosowania Matematyki, Vol. 3 (1956), pp. 8-45] 
the author proposed a test of independence based on the number of pairs, in the two-dimen- 
sional sample (2, yi), ‘%- , (Zen, Yen), for which (x; — Me,)(y;i — Me,) > 0, where 
Me, , Me, are the sample medians of z and y respectively. In the present paper the power 
function of this test is investigated. If Me, , Me, are replaced by the (usually unknown) 
population medians, it is possible to evaluate the power function exactly; this is used as 
a basis for a heuristic approximation to the power of the original test. This approximation 
is shown to give good results when the null hypothesis (of independence) is true. The asymp- 
totic relative efficiency of the test is also evaluated. 


20. Location and Scale Parameters in Exponential Families of Distributions 
(Preliminary report). T. S. Ferguson, University of California, Los Angeles. 


If a location parameter, @, is the parameter of a one-parameter exponential family of 
distributions, then the distribution for fixed @ is either (1) the distribution of a“ log X, 
where X has a gamma distribution and a =~ 0, or (2) (corresponding to the case a = 0) a 
normal distribution. This result is extended to the case where the location parameter is a 
parameter in a k-parameter exponential family of distributions, with the aid of the (prob 





626 ABSTRACTS 


ably superfluous) assumption that the density has k + 2 derivatives. Similar results are 
derived for scale parameters. If the parameters of a two parameter exponential family of 
distributions may be taken to be location and scale parameters, then the distribution must 
be normal. The family of distributions of the first mentioned result is thus seen to be a main 
class of distributions to which Basu’s theorem (on statistics independent of a complete 
sufficient statistic) applies. Furthermore, this family of distributions provides a natural 
setting in which to prove certain characterization theorems which have been proved sepa- 
rately for the normal and gamma distributions. 


21. A Double-Ended Queuing Process. Samuen M. Givern, Northeastern 
University. (Introduced by Lionel Weiss.) 


A queuing model is considered in which the possible states are designated by the positive 
and negative integers and zero. Two types of units, described as ‘‘positive’’ and ‘‘negative’’, 
enter the system. The arrival of a unit of one type constitutes an addition to the queue if 
units of the same type are waiting, and a service completion if units of the other type are 
waiting. Thus, if the system is in any state n, the arrival of a “‘positive’’ unit will always 
shift it to state n + 1 and the arrival of a ‘‘negative’’ unit will always shift it to state 
n — 1. Arrivals of the two types of units are governed by Poisson probability laws with 
time-dependent parameters A(/) and w(t). An explicit transient solution is obtained for the 
probability that the system will be in state n at time ¢, but a steady-state solution does not 
in general exist. In the special case in which \ and yu are constant, this solution is the well- 
known probability of a difference of n between two Poisson variables. In the general case 
conditions are determined under which the mean and the variance will remain finite as 
t becomes infinite. 


22. The Use of Sample Ranges in Setting Exact Confidence Bounds for the 
Standard Deviation of a Rectangular Population. H. Leon Harrer, 
Wright-Patterson Air Force Base. 


A discussion is given of point estimates and interval estimates of the population stand- 
ard deviation o, based on the sample range and quasi-ranges. In the case of a rectangular 
population, the efficient point estimate and the most effective interval estimates are those 
based on the sample range, so it is not necessary to consider estimates based on sample 
quasi-ranges. The coefficients of the sample range w in the exact confidence bounds for the 
population standard deviation o are found by taking the reciprocals of percentage points 
of the (standardized) range W = w/c. The following tables for the rectangular population 
are included: (1) A six-decimal-place table of the percentage points of the range correspond- 
ing to cumulative probabilities P = 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1 (0.1) 
0.9, 0.95, 0.975, 0.99, 0.995, 0.999, 0.9995, 0.9999 for sample sizes n = 2 (1) 20 (2) 40 (10) 100; 
and (2) a table, to seven significant figures or six decimal places, whichever is less accurate, 
of the coefficients of the sample range w in the exact lower confidence bounds for o for the 
above values of P and n. 


23. The Moments of the Non-Central /-Distribution. D. Hoapen, R. 8. Pink- 
HAM AND M. B. Wiik, Rutgers University. 


If X is normally distributed with mean 6 and variance 1 independently of Y? distributed 
as chi-squared with f degrees of freedom, then the random variable t = X(f)*/Y is dis- 
tributed as non-central t. Let p(z, y?) be the joint density of X and Y?. Then, explicit ex- 
pressions for the raw moments of ¢ can be obtained readily from an identity which results 
from continued differentiation, with respect to 5, of [f p(x, y?) dy?dx = K(é, f). A table 
of numerical values relevant to the first four central moments and cumulants is given. 


Application of the table is illustrated in finding the approximate values of the expectation 
of certain functions of the non-central ¢. 





ABSTRACTS 627 


24. Some Notes on the Investigation of Heterogeneity in Interactions. N. L. 
JOHNSON, Case Institute of Technology. 
Consider a cross-classification with model yi; = A+R: + Cj; +2; G@ =1,-:- 7; 
j=1,-:-c), DR= > C; = 0, and 2;;’s independent N (0, o;) variables. To investi- 
gate possible differences among the o;’s, criteria based on the statistics 


S; = 2, (vss - Hi. — 9-5 + §--)* 


may be used. 


ce Tr Tr ¢e 
(i = 9 Dy, 9-4 = 2d 45, 9- = (re)? > DY ys.) 


j=l 


t=—] j=l 


The distribution of the ratio S;/S; is shown to be related to that of g = (>otewi,) / 
(>"1-1 wij) where the w’s are multi-normal units variables with correlation p between 
wi; and wi; , all other correlations being zero. The distribution of g is shown to be a mixture 
of F distributions with » + 2k, v + 2k degrees of freedom, in proportions given by terms 
2? 
1 — ?? 1 — p? 
the distribution of S;/S; are discussed and conjectural extensions to the joint distribu- 
tion of S; , Se, --- , S- are mentioned. 


in the expansion of the negative binomial ( 


—hv 
) . Some approximations to 


25. On the Limiting Distribution of —2 log \ in the Non-regular Case. DonaLp 
A. Jones, University of Michigan. 


Consider a family, say F, of pdfs (Lebesgue) on E’ which satisfies a set of regularity 
conditions w.r.t. a real-valued parameter. A new family of pdfs, say G, can be constructed 
from F by doubly truncating each member of F at unknown points, a < b. Let x = 
(2 ,%2,°*** , 2m) be a random variable satisfying the following conditions: 

(i) the real random variables 2 , 2 , --+ , 2m are independent and, 

(ii) the distribution of z; has a pdf in G,j = 1, 2, --+ , m. 

Observe that the pdf of x belongs to a family indexed by a 3m-dim. parameter, say @ with 
range 2, where m of the dimensions are from F and 2m were introduced by truncation. Let 
X:,X2,°** ,Xn, be n random observations of x and denote their pdf, as a function of @, 
by F, . The limiting distribution of —2 log \(n — «), where 


A = Lub. [Fa : 0 €a)/lu.b. [Pa : 0 € 9] 


and w C Q, is shown to be, for some special classes of w and assuming that @ ¢w obtains, 
a chi-square distribution with 2b + dd.f. where b = (dim. Q — dim. w) w.r.t. the “trun- 
cation parameters’? and d = (dim. 2 — dim. w) w.r.t. the “regular parameters’. The 
motivation is hypothesis testing by the likelihood ratio test and the methods of proof are 
standard limit theorems. 


26. Use of Some a priori Knowledge in the Estimation of Means from Double 
Samples. 8. K. Karri, Florida State University. 


When there is no a priori knowledge available regarding the values of the mean y, the 
overall sample mean # has many desirable properties as an estimate of u. The problem 
considered here is one in which the statistician has a guessed estimate (guestimate) on u 
either due to his past experience or due to his acquaintance with the behavior of the system. 
It is found that if », is the guestimate, the true value is close to yo, the variance co? is known 
and if it is possible to take samples of size n; , and n2 in a succession, an estimate with an 
expected mean square smaller than that of the overall mean # is obtained by using the 





628 ABSTRACTS 


mean of the first sample only, ignoring the second if this mean lies in the interval 
(uo — o/(2n; + ne)*, wo + o/ (2m, + nz)!) and by using the overall mean if it does not. 
If the observations are independent, have a common normal distribution and u close to uo , 
this method gives an estimate with an expected mean square 3.2% smaller than that of 
when n;/nz = 0.33 and the sample size used to obtain this estimate is 74.9% of (ni + ne), 
i.e., there is a saving of 25.1%. 


27. On the Expected Value and Variance of a Ratio Estimate. J. C. Koop, 
North Carolina State College. (By title) 


From the data of sample surveys various kinds of ratio estimates and their standard 
errors are computed, often without questioning the validity of the underlying mathematical 
procedures used in deriving the formulas. The limitations of the classical technique of 
deriving the expected value and the variance of ratio estimates of finite populations by a 
series expansion are briefly discussed in this paper. By a new device of expanding the ratio 
or its powers as a truly convergent series, and then determining the expected values term 
by term, expressions differing from the classical ones only by a constant multiplying factor 
are obtained. This constant multiplier is an element of the inequality which shows what 
minimum value it shall assume to insure that the series expansion is valid. 


28. On the Higher Moments of Linear Estimates Based on Multistage Samples 
from a Finite Population. J. C. Koop, North Carolina State College. 


The theorem for the derivation of higher moments of linear estimates based on multi- 
stage samples from a finite population is derived. An rth order moment of a random vari- 
able about its mean or expected value is shown to be equal to the expected value (or the 
first moment) of the conditional moment of the same order, plus the rth moment of the 
conditional expected value, plus a series of product-moments of conditional expected 
values of various orders (of course not exceeding r) each of them with appropriate binomial 
coefficients. For the case when r = 2, we obtain the theorem given in the well-known text- 
book of Hansen, Hurwitz and Madow. In the context of multistage (or multiphase) sam- 
pling the net result of all moments of an estimate about its expected value of order lower 
than r can be interpreted as increase due to extra steps in sampling beyond the first step 
(stage or phase). The theorem is applied to obtain the third and fourth moments of linear 
estimates based on multistage samples from a finite population with the following prob- 
ability systems which are very general: 

(i) first stage units are selected with equal probabilities and without replacement but 
the probabilities for the selection of units in the subsequent stages are left undefined. 

(ii) first stage units are selected with unequal probabilities and with replacement but 

the probabilities for the subsequent stages are left undefined as above. 
When the probabilities beyond the first stage and the structure of the sample (i.e., sample 
design relevant to those steps in question) are defined, then the conditional moments can 
all be evaluated in terms of the moments of the underlying variates. With these moments 
the expressions for the classical coefficients of kurtosis and skewness of an estimate may be 
obtained. For the above cases (and indeed for all multistage estimates) the approach to 
approximate normal form is retarded because of the presence of the product moments. 


29. On Simultaneous Tests in Nested Designs (Preliminary report). P. R. 
KRISHNAIAH, Remington Rand UNIVAC. 
Consider the ‘‘fixed effects’? model 2ijim = aije + Cijem (@ = 1,2,--- , p37 =1,2,-++ 93 


k=1,2, --- ,r;m =1,2, --- ,8) wheretheerrors é;jzm are distributed independently and 
identically with zero means and variance oj ; ai; is the effect of kth level of C within jth 





ABSTRACTS 629 


level of B within ith level of A and >> >> } aij = 0. Now consider the following 
hypotheses 


i; : ijk = Qij- for all k 
Hy : Gj. = &.. for all j 


H: &.. = @... for all 7 


where &j- = 7! >> aige 5 Gi. = G7? Do ay. 5 &.. = pO >. a... The present paper dis- 
cusses testing pq hypotheses of the form H;; , p hypotheses of the form H; and the hypoth- 
esis H simultaneously by using (1) the Simultaneous ANOVA Test and (2) the Joint 
(Overall F) Test. The lengths of the simultaneous confidence bounds associated with the 
above tests are compared. These results are generalized to multivariate situations and to 
higher order hierarchal classification. 


30. A Multivariate Analogue of One-Sided Test (Preliminary report). AKk1o 
Kupo, University of Michigan Medical School. 


Consider a sample of size n from a k-variate normal population with unknown means 
m, (i = 1, +--+ , k) and a known variance matrix A. The likelihood ratio test of the null 
hypothesis Hy) ; mj = 0 (i = 1, --- , &) against the alternative hypothesis H; ; m; = 0 
(¢ = 1, --- , k) where inequality is strict for at least one of the k will be considered. The 
test is based on the statistic K? = n[#’A~'% — min ( — m)’A~!( — m)], where Z and m 
are the sample and the population mean vectors and the minimum is over the region m; = 0 
(¢ = 1, --- , k). The computation of K? will be discussed, and we shall prove 


Pr(K? = Ki) S >) Prixkan 2 Ki)Pu(l — Pw), 
so*S* 


where the summation runs over all non-empty sets M included in or equal to the set K = 
{1, 2, --- , k}, n(M) is the number of elements in M, Py = Pr(% 20,ie M), Py = 
Pr(z; 2 0,12 M\a2; = 0,ieM). Py and Py are probabilities under the null hypothesis, 
and the latter is a conditional probability equal to zero when M = K. 


31. Power of Mood’s and Massey’s Test Against Exponential and Rectangular 
Alternatives. F. C. Leone, I. M. CHAKRAvVARTI AND J. D. ALANEN, Case 
Institute of Technology. 


When sample observations are originally recorded in order of their magnitude, Mood’s 
test, based on the median of the combined samples, and Massey’s extension of Mood’s 
test based on fractiles, have much to commend themselves as quick tests. In Mood’s test 
one need record observations only up to the median of the combined samples, and in Mas- 
sey’s test, up to the highest fractile included in the test. It is known (Mood, (1954) and 
Andrews, (1954)) that the asymptotic relative efficiency of Mood’s test against normal 
and rectangular alternatives is low. However, it was felt that there was enough gain in 
Massey’s extension to justify this investigation. The objects of the present investigation 
are: 

(i) to derive the exact power functions of Mood’s and Massey’s tests for two samples 
against parametric alternatives of exponential and rectangular populations; 

(ii) to tabulate them for comparable sample sizes in order to get an idea about their re- 
spective performances and also to evaluate if there is any resultant gain in the use of Mas- 
sey’s test (which uses more than one fractile and hence is more elaborate) over Mood’s 
tests. 





630 ABSTRACTS 


32. On some Properties of Compositions of an Integer and their Applicatlon to 
Probability Theory. T. V. NaRAYANA AND 8. G. Monanty, University of 
Alberta. 


A partial order called ‘‘domination”’ has been defined on the r-compositions of n and the 
r-compositions of n form a distributive lattice (1 S r S n), (Narayana and Fulton, Canad. 
Math. Bull., Vol. 1, No. 3). Using an obvious partial order, the distributive lattice formed 
by the simple symmetric sampling plans of size 2n is shown to be isomorphic to the lattice 
formed by the n-compositions of 3n dominated by the n-compositions (3, 3, --- 3) or 3n. 
A natural one-to-one correspondence between the simple symmetric sampling plans of 
size 2n and simple sampling plans of size n induces the lattice on the simple sampling plans 


6 
« 


of size n and as a consequence the number of such plans is found to be m(, * 1 . Since 
it is easily seen that this distributive lattice is isomorphic to a distributive lattice formed 
by the minimal lattice paths in a classical ballot problem ({1] Grossman, Scripta Mathe- 
matica, Vol. 15, No. 1-2; [2] Dvoretzky and Motzkin, Duke Math. J., 1947), the solution to 
the problem is rederived and a further generalization of it is obtained. The authors hope to 
extend the application of this unified approach to various other problems in probability 
theory. 


33. Fractional Factorial 2” and 3" Designs with and without Blocks, Preserving 
the Main Effects and the Two-Factor Interactions. M. 8. Patret, Uni- 
versity of North Carolina and Research Triangle Institute. 


The object of this paper is to construct fractional factorial designs which have a number 
of treatment combinations just sufficient to estimate the main effects, the two-factor inter- 
actions, and the error, without requiring difficult computation. Starting with orthogonal 
arrays of strength d as defined by Rao, fractional designs have been built up by combining 
the treatment combinations belenging to these arrays, the number of arrays to be taken 
depending upon their strength and the number of effects to be estimated. It has been shown, 
for instance, that if the treatment combinations regarded as points satisfying the equations 
Baik, + Aa2le + +++ + Gamim = da (a2 = 1,2,--- ,r) in GF(2) form an array (2”~, m, 2, 2), 
then the totality of treatment combinations satisfying the (r + 1) such arrays given by 
the solutions of AXJ’ = Din GF (2) where A = ((aai)),a = 1,2,--- ,r;i =1, 2, --- , m; 
X' = (m1, %2,°*: , tm), D = (0,1,) and J’ = (1,1, --- , (r + 1) times), are sufficient to 
estimate the main effects and the two-factor interactions, assuming all higher factor inter- 
actions to be absent. The same has been worked out with component arrays, each of strength 
3. Finally, assigning the treatments belonging to r + 1 component arrays to blocks, block 
designs have been obtained. Following a slightly different procedure, fractional designs 
for the 3” series have also been obtained. 


34. On the Performance of the Group-Screening Method. M. 8. Paret, Re- 
search Triangle Institute. (By title) 


In this paper is discussed the efficiency of two-stage group-screening designs relative to 
the corresponding single-stage designs. The criterion of efficiency is the expected number 
of correct decisions that are made using the design. Only orthogonal designs are considered. 
Let y be the size of the critical region used for decision making in a single-stage design and 
a, Bn (n = 1, 2, +++ , g) be sizes of the critical regions in a two-stage design when n out of 
g group-factors are declared effective. Under certain assumptions, sufficient conditions are 
developed under which max.,s, (expected number of correct decisions) exceeds max, (ex- 


pected number of correct decisions) and vice-versa. Some other side results of interest are 
also given. 





ABSTRACTS 631 


35. On the Distribution of First Significant Digits. Roger PinkHam, Rutgers 
University. (By title) 


Consider an initial distribution on the positive real numbers. If these numbers are ex- 
pressed base 10, a distribution of first significant digits is thereby induced. It is shown: (1) 
The only distribution of significant digits which is invariant under scale change of the ini- 
tial distribution is the logarithmic, (2) A wide range of initial distributions result in the 
logarithmic law to a high degree of approximation. 


36. A posteriori Distributions in the Translation Parameter Case (Preliminary 
report). Martin Fox anp HerMAN Rusty, Michigan State University. 


Let (X, Y) have distribution function F(z — 6, y) with @ unknown. Assume X and @ 
are restricted to the same linear set S of real numbers (such as the integers or the rationals) 
while Y can take values in any space. Let the a priori distribution of @ be ‘‘uniform’’ on 
(—«, ©) nS. Let £,., be the a posteriori distribution function of 6 given X = z, Y = y. 
Let 6:.y:2 be the ath quantile of é,,, and let 6,., have distribution function £,., . Then, (i) 
given 6, the (1 — a)th quantile of @y y.. is 6; (ii) if it exists, then 


E,( 6x.y — 6)" = dX (—1 )* (") (ux Mn—k + COV (ue,y Hn-,y)] 


where uz = Ee(X — 0)* and px.y = Ee(X — 6)*| Y = y); and (iii) given @, the distribution 
of Oe» — #is symmetric with absolute maximum at 0. The proofs of (i) and (ii) follow from 
the observation that &.(#) = F(x — E£) for any E C S. The proof of (iii) consists in show- 
ing that 6y,y — @ = U — V where U and V are independent and identically distributed. 
Statement (iii) can be extended to the case of distributions invariant under a locally com- 
pact group transitive on the parameter space. 


37. Asymptotic Relative Efficiency of Mood’s Test for Two-Way Classification. 
Y.S. Sarue, University of North Carolina. (By title) 

Let the distribution of 2;(i = 1, 2, ---, 7; 7 = 1, 2, --+, c) be Fy;(z) = 
F(z + »v + a + 8;). Then under the null hypothesis H, : aj = 0 (@ = 1,2, ---, 7), all 
the observations in a given column have same distribution. Let 7; be the median of obser- 
vations in the jth column and in the two way table, let z;; be replaced by 1 if it exceeds 
Z; or by 0 if it does not. Let m; be the number of 1’s in the ith row. Then Mood and Brown 
show that under the null hypothesis 


Y 2 
= DF (q, -2Y, 
ca(r — a) i= ¥ 


where a = r/2 if r is even or a = r — 3 if r is odd, has a limit x? distribution with r — 1 
d.f. as c > «. In this paper, we show that under H, : a; = 4;/ch where > j.48; = 0 but 
not all 6;’s are 0 and r odd, X%, has a limit noncentral x? withr — 1 d.f. and noncentrality 
parameter 


3 9\2 00 P 
= MOD TNT Co - rrr wv] Oe 


where f(y) is the p.d.f. For r = 3, f(y) = 2x-te-’, X = (27/167) >> 8: . For the usual 
F-test, the corresponding noncentrality parameter is ). 4; . Thus a.r.e. of X4, compared 
with F-test for r = 3 is 27/16” = .54. 





ABSTRACTS 


ABSTRACT WITHDRAWN IN PROOF 


39. The Method of Parallel Tangents (PARTAN) for Finding an Optimum. 
B. V. SHau, R. J. BUEHLER AND O. KeEmMprTruorNge, Iowa State University. 


Let y be a quadratic function of n = 3 variables (or a monotone function thereof) hav- 
ing a minimum; let P:P2:P;P.PsPs be a polygonal line such that P; occurs at the minimum 
value of y on the (extended) line P;_,P; ; let +; be the plane tangent to the contour of y 
at P; . For any P; , P:P: may be any direction of decreasing y. P2P; may be any decreasing 
direction parallel to 7, . P, is on the line P,P; beyond P; . P,P; is parallel to both m; and 
a2. Then w2 and zs are parallel and the minimum of y will be at P, which is on the line 
P,P; beyond P; . The generalization to higher dimensions is to take P2;_,, P2;, and P2; 
colinear. Then P2; is the minimum in the linear space spanned by the preceding points. 
The proof is given for hyperspherical contours, and is extended to hyperellipsoidal con- 
tours by a general affine transformation. The instructions are invariant under change of 
units and, more generally, under any affine transformation by virtue of the exclusive use 
of parallel, rather than normal, directions. In two dimensions P, is also the minimum for 


a quite general class of contours all of which make the same angle with any given ray from 
the minimum. 


40. Selection of the Best Treatment in a Paired-Comparison Experiment. B. J. 
TRAWINSKI AND H. A. Davin, Virginia Polytechnic Institute. 


The results of a balanced paired-comparison experiment in which ties are not permitted 
can be summarized in the scores (number of preferences) achieved by the treatments being 
judged. Let wi; (i,j = 1,2, +--+ , t; 7 #7) denote the constant probability that the ith 
treatment 7; is preferred to 7; in any one of n comparisons of the two. Then the best treat- 
ment 7’, is the one with the highest average probability 2;. of success. Following the gen- 
eral approach of Gupta and Sobel one can select a subset S of the 7; by including in S the 
T; associated with the top score k,.) and with all other scores = k,1) — v. With proper 
choice of » this decision rule has the property that 7’, is included in S with at least a speci- 
fied probability. Tables have been constructed giving » as a function of ¢t and n. Suppose 
next that mm; = * > 4(j = 1,2,---,t— 1) and a; = 4 otherwise. For this case the 
smallest number of replications n has been determined, for given ¢t and z, which ensures 
that with at least a specified probability the highest score corresponds to 7,(= 7T.). The 
formulation of the problem is that of Bechhofer but in the present case the procedure does 
not have the usual conservative properties associated with Bechhofer’s approach. 


41. A Plotting Procedure in MANOVA. M. B. WiLk anv R. GNANADESIKAN, 
Bell Telephone Laboratories, Murray Hill. (Invited paper) 


A procedure is presented for the generalization and extension, to multiresponse factorial 
experiments, of the technique of half-normal plotting for uni-response factorials. Consider, 
for definiteness, two-level factorial experiments wherein, for each treatment combination, 
p responses are observed. For this multiresponse situation, the analogue of the uni-response 





ABSTRACTS 633 


single degree of freedom contrasts is a vector of p elements, each element being a single 
degree of freedom contrast corresponding to one of the responses. A positive semidefinite 
or definite quadratic form in the elements of each of these vectors is obtained (for example, 
the squared length of the vector). This is interpretable as a distance function in a meaning- 
ful space. The null distribution of the quadratic form is approximated as a gamma with 
two parameters. Under reasonable experimental assumptions, the quadratic forms are 
mutually independent. Using only the m smallest of the quadratic forms as a sample of the 
first m order statistics from a sample of size k (where m S k S N, the total number of 
contrasts) from a gamma distribution, maximum likelihood estimates are obtained for the 
parameters of the gamma distribution. Using the estimates, a ‘‘gamma plot’’ is made of 


the ordered quadratic forms. Interpretations and uses of the plot are discussed, with ex- 
amples. 


42. On the Estimation of Error Variance and Number of Significant Effects in 
Two-level Factorial Experiments. M. B. Wik, R. GNANADESIKAN, AND 


Miss A. R. Esner, Bell Telephone Laboratories, Murray Hill. (Invited 
paper) 


This paper is concerned with the estimation of the error variance, o*, and the number 
of “‘significant’’ contrasts, r, when a collection of N contrasts, each based on one degree 
of freedom, are analyzed. A typical situation giving rise to this is the set of N = 2" — 1 
estimates of effects obtained from the responses in a 2* factorial experiment. Let yi < y3 < 

- < yy denote the ordered squared contrasts. Suppose that the m smallest (m specified) 
are known to “belong to error’’, and that k = N — r, the number of error contrasts, is known. 
Tables are provided for determining at once the maximum likelihood estimate of o? based 
on the first m order statistics yj , --- , y,. A summary of a Monte Carlo study of the 
properties of this estimate is given. Though k will in general not be known, it is felt that 
this procedure, based on a ‘“‘guessed”’ k will be less biased than one based on either pre- or 
post-experiment ‘‘assignment’’ of effects to error or of the ‘‘conservative’”’ estimation of 
error from all the contrasts in the collection. The procedure is buttressed by an informal 
sequential inference scheme which is specifically directed toward the determination of k 
from the data. Examples and results of some Monte Carlo studies are given. 


43. Estimation of Parameters of the Gamma Distribution Using Order Statistics. 
M. B. Wi1k, R. GNANADESIKAN, AND Miss M. J. Huyert, Bell Telephone 
Laboratories, Murray Hill. (Invited paper) 


Using the m smallest observations in a random sample of size K from a gamma distribu- 
tion whose density function is, 


S(y3 4,2) = [A/T@) Jey", y>0,A>0,7>0, 


the problem of estimating \ and 7 is considered. Let P and S denote, respectively, the 
ratios of the geometric mean and the arithmetic mean of the m smallest observations to the 
mth observation. Then, 0 Ss P s S 3S 1, and P and S are jointly sufficient for \ and 7». 
Tables are given, for various values of K/m, P and S, from which the maximum likelihood 
estimates of \ and 7 are easily obtainable. 


44. Probability Plots for the Gamma Distribution. M. B. Witk, R. GNANADEs!I- 


KAN, AND Miss M. J. Huyert, Bell Telephone Laboratories, Murray 
Hill. 


If y: S ye S «++ S yn iS an ordered random sample from the general gamma distribu- 
tion with density 


Sly; a, 4,0) = A/T) — a) eOr™; asSy<», 





634 ABSTRACTS 


where —~ <a < ©, >0O and 7 > 0, then a plot of the y values against appropriate 
quantiles of the standard gamma distribution (a = 0, \ = 1) would be expected to yield 
a straight line pattern with intercept a and slope 1/A. This paper describes a systematic 
numerical method for calculating the quantiles of the standard gamma distribution cor- 
responding to the fractions b; = (¢ — 4)/n,7 = 1, 2, --- , n, for given values of 7. A table 
of quantiles as a function of » is given, together with several samples of gamma plots. Vari- 
ous uses of such plots are discussed, with examples. The entire procedure, viz the calcula- 
tion of quantiles and the actual plotting, has been programmed for a high-speed computer 
and that procedure (with instructions for use) is described in an Appendix. 


(Abstracts of papers to be presented at the Annual Meeting of the Institute, Seattle, Washington, 
June 14-17, 1961. Additional abstracts will appear in the September, 1961 issue.) 


1. An Optimal Sequential Accelerated Life Test. Sruarr A. Besser, General 
Telephone and Electronic Laboratories, HERMAN CuHERNoFF, Stanford 
University, AND ALBERT W. MArsHALL, Stanford University and Institute 
for Defense Analyses, Princeton. 


Suppose that the distribution of lifetime 7 of a device subjected to stress z is exponential 
with failure rate (reciprocal of mean) equal to 6:2 + 6.27 for 0 S xz S 2*. It is desired to 
test Hi: 0:20 + 02%? S a versus H2: 6,2) + 622%? > a. The cost of experimenting at stress 
level z is assumed to be proportional to the mean lifetime (@,:2 + 622%)—!. An asymptotically 
optimal sequential procedure, characterized by Chernoff, Albert, and Bessler, is derived. 
After each observation it calls for selecting the next observation from one of two levels. 
These levels correspond to the solution of two person zero sum game whose payoff function 
is a Kullback-Leiber Information number divided by the cost of experimentation. 


2. A Statistic Related to Kolmogorov’s. H. D. Brunk, University of Missouri. 


A distribution-free statistic is proposed for testing the hypothesis that a sample comes 
from a population with prescribed distribution function F. It occupies a position inter 
mediate between Kolmogorov’s (Inst. Ital. Attuari, Giorn., Vol. 4 (1933), pp. 1-11) and 
Sherman’s (Ann. Math. Stat., Vol. 21 (1950), pp. 339-361), and the corresponding test ap- 
pears more powerful than Kolmogorov’s against certain alternatives (e.g., different scale 
parameter, for a symmetric distribution) and less powerful against others (e.g., different 
location parameter). If F, is the empiric distribution function of the sample, then the sta- 
tistic, proposed by Dr. Harold Lischner and the author, is max, [F(z) — F,(zx)] — 
min, [7 (x) — F,(x)]; more precisely, it bears the same relationship to that mentioned as 
does Pyke’s (Ann. Math. Stat., Vol. 30 (1959), pp. 568-576) modification of Kolmogorov’s 
to Kolmogorov’s itself. Asymptotic formulas due to Doob (Ann. Math. Stat., Vol. 20 (1949), 
pp. 393-403) and Donsker (Ann. Math. Stat., Vol. 23 (1952), pp. 277-281) yield the asymp- 
totic distribution. A theorem of Sparre Andersen (Skand. Aktuarietidskrift, Vol. 36 (1953), 
pp. 123-138) makes possible an essential simplification of the problem of determining the 
distribution for finite sample size. After this simplification, methods developed for Kolmo- 


gorov’s statistic by Kolmogorov, Feller, Dempster and others can be used. Tables are in 
preparation. 


3. Evaluation and Design of Multiple Choice Questionnaires (Preliminary 
report). H. Coernorr, Stanford University. 


The evaluation of tests based on multiple choice questions is complicated by the fact 
that subjects who do not know the answer have the opportunity to guess. If only 3 of the 
subjects answer correctly a question with three choices, one may infer that very few sub- 





ABSTRACTS 635 


jects knew the answer. The standard procedure of giving credit to those who answered this 
question correctly serves no purpose but to introduce “‘noise’’ into the test score. In this 
paper, a method is proposed to give scores which depend not only on the individual’s an- 
swers but on those of the population. The score z is regarded as an estimate of a value v 
attached to the subject’s knowledge. The method of scoring is to be selected so as to mini- 
mize E(x — v)*. Applications to various models are given. The problem of optimal design 
is discussed. The mathematics of the theory reduces to that of regression or conditional 
expectation. For example, the best score given the subject’s response z is z(z) = E{v | z}. 
If, in the example described above, p is the proportion of students who answer a one-point 
question correctly, all correct answers are given a score of (3p — 1)/2p and all incorrect 
answers a score of zero. 


4. Optimal Accelerated Life Designs for Estimation. H. Cuernorr, Stanford 
University, (Invited paper). 


It is desired to estimate the probability distribution of the lifetime of a device when 
subjected to a standard environment. If this lifetime tends to be large, the time consumed 
in testing a sample of devices will be exorbitant. A common approach is that of acceleration. 
The sample of devices is subjected to environments of greater stress than that of the stand- 
ard environment. To extrapolate these results a model relating distribution of lifetime to 
environment is required. Once such a model is proposed the problem of optimal design 
arises. At what level of stress should experiments be carried out? An answer to this ques- 
tion involves the cost of experimenting at a certain stress level. Optimal designs are de- 
rived for five examples. In each of these the distribution of lifetime is assumed to be ex- 
ponential with failure rate (reciprocal of the mean) equal to a specified function of stress 
and the cost of experimentation proportional to the mean lifetime at the stress level. Each 
of the optimal designs consists of experimenting at two stress levels. The method reduces 
to an application of a technique of Elfving derived for estimating the optimal designs for 
the coefficients of a linear regression. 


5. Minimum Risk Estimation: A Non-Parametric Case Involving Percentiles. 
Hersert B. EIseENBERG AND JOHN E. WatsH, System Development 
Corporation. 


Minimum risk estimation of 100 p-percentile, 6, , is considered for any continuous uni- 
variate population. Sample values are 2 S --- S 233% = —®; 2ngi = ©. Risk = 
die L(6, , e | te < Op S 24s) Pe (xi < Op S 241) and estimate is value 6, minimizing 
this function; 05? in conditional loss function is representative value for @, given that 0, 
in (x; , Z:4:] and can be randomly selected from a specified distribution over this interval. 
Method is applicable when the minimizing 6, is unique. If conditional loss function is 


(6, — 05° }?, minimizing 6, = >of 0(")pra — p)"-*. Conditions for applicability of 


method are examined. Explicit results are derived for special classes of loss functions. 
Further nonparametric applications (univariate and multivariate) are being developed 
through extension of percentile concept to that of general population coverage—specifi- 
cally, to minimum risk estimation of regions with specified amounts of population coverage. 
Method is also applicable to estimation of parameters for parametric populations (con- 
tinuous or discrete and univariate or multivariate). In general, the confidence intervals 
Zi < 05 S 2%; are replaced by mutually exclusive confidence regions that include all the 
possible values for the quantity estimated. For the continuous parametric case, size of all 
but tail confidence regions ordinarily can be made arbitrarily small. Then, in the limit 
risk function is usually expressible as an integral. 





636 ABSTRACTS 


6. Non-Parametric Sequential Tests for the Two-Sample and Several-Sample 
Problems. R. R. M. GEOGHAGEN AND JoHNn E. Watsu, System Develop- 
ment Corporation. 


Null hypothesis asserts that k specified populations (k = 2) are equal. Populations can 
be univariate or multivariate and arbitrary otherwise. Same number of sample values is 
taken from given population at each sequential] step; this number can be different for each 
population. Each observation is converted to categorical form by subdivision (same for all 
populations) of space of possible values. Categories are denoted by 1, --- , C and should 
have nonzero null probabilities. At each sequential step, cbserved value is vector with 
dimensionality equal to total number of observations per step and each coordinate is one 
of 1, --- , C. A value with all coordinates equal is discarded. Disjoint classes are formed 
from remaining possible vector values so that each class has a determined null probability 
(many possible ways unless C very small). In all cases, these classes have a multinomial 
distribution. The completely determined null distribution can be tested against specified 
alternative distributions by standard sequential analysis methods. Also some special se- 
quential methods are presented. When C is not very small, alternatives of a broad class 
can be emphasized by suitable choice of category subdivisions and suitable formation of 
classes. Procedures for making suitable selections are outlined for several univariate cases 
and a multivariate case. Also methods for determining category subdivisions with nonzero 
null probabilities are considered. 


7. Three-Quarter Replicates of 2” Designs. Perer W. M. Joun, California 
Research Corporation and University of California, Berkeley. 


Let a three-quarter replicate be formed by omitting the quarter defined by J = P = 
Q = PQ, where P, Q, PQ are main effects or interactions. Let R be any other effect. Then 
if one member, say PQR, of the alias coset R, PR, QR, PQR is negligible, a priori, the least 
squares estimates of the remaining members are obtained from half replicates confounded 
only with PQR. If, in addition, QR is negligible, then the least squares estimates of R and 
PR are obtained by averaging their estimates from two half replicates, and are confounded 
only with QR and PQR. In the three-quarter replicate of 2" obtained by omitting = AB = 
AC = BC, the alias cosets consist either of four effects with odd numbers of letters or else 
of four effects with even numbers of letters. In each of the alias cosets containing ‘‘odd’’ 
effects, equate three of these to main effects, and we have a design of Type B (main effects 
clear of two factor interactions) for 3.2"-* factors with 3.2"-? points. Other designs ob- 
tained include a three-eighths replicate, 48 points, of 2’ of Type B’ (both main effects and 
two factor interactions clear) and a six point first-order design for five factors. 


8. Oddities in Estimating the Scale Parameter of the Weibull Distribution. 
EvGcene H. Lexman, Jr., Purdue University. (Introduced by Irving W. 
Burr.) 


In a recent paper [Estimation of the Scale Parameter in the Weibull Distribution Using 
Samples Censored by Time and by Number of Failures, Ph.D. Thesis, North Carolina State 
College, Raleigh, (1961)] the author described a test procedure for estimating the scale 
parameter a in the Weibull distribution, F(t) = 1 — exp [—(t"/a)], t > 0. The test is: 
place on test N tubes whose life spans follow the Weibull distribution with known scale 
parameter M; stop test when both R tubes have failed and T hours has elapsed. The maxi- 
mum likelihood estimator @ of a from this procedure possesses several mysterious idiosyn- 
crasies. In the present paper some of these strange mannerisms are explained. For instance, 
the variance, V, and mean square error, D, (if N and R are small) increase with T for a 





ABSTRACTS 637 


short interval of small values of 7’, thus implying that a short test supplies more informa- 
tion than a long one. The reason for this peculiarity is that if the actual number of failures, 
r, exceeds by only a small integer the minimum R, then @ is very biased and variant. If T 
is small, the probability of using this particular form of @ increases with 7 before it de- 
creases, and thus V and D of 4 as a whole portray this same confusing oddity. 


9. The Distribution of Probabilities in a Stochastic Learning Model (Preliminary 


report). J. R. McGrecor anp T. V. Narayana, University of Alberta. 
(By title) 


Bush and Mosteller (1955) have developed a model to describe simple learning behavior 
in the case of two subject-controlled events. They consider a learning experiment in which a 
subject is presented, on repeated trials, with the same two alternatives A; and A: . Their 
probabilistic model specifies the way in which the response probability (i.e., the probability 
of choice of alternative A:) is modified from trial to trial as a result of earlier responses and 
outcomes. On the nth trial there will be 2* possible values of the response probability (called 
p-values) the vth value being denoted by p» (v = 1, 2, --- , 2"). These p-values have a 
probability distribution, say, Prob (pn) = P,, . In a special case, the transition from trial 
n to trial (n + 1) results in each old p-value p,, generating two new p-values given by 
Qipm = 4pm, Qepm = 4 + 4pm . It is the purpose of this note to show that in the special 
case considered, the use of binary representations of integers or equivalently of the boolean 
algebra formed by the subsets of a set yields interesting results about the exact asymptotic 
distribution. 


10. Some Properties of Compositions of an Integer and their Application to 
Probability and Statistics. T. V. NarayANA AND S. G. Monanrty, University 
of Alberta. (By title) 


The authors have developed a unified approach to various problems in probability and 
statistics, through partial orders defined on compositions of an integer (abstract published 
in Ann. Math. Stat., June, 1961). This approach yields a new proof of a theorem of Chung 
and Feller: Let Lx... be the number of minimal lattice paths from the origin to (n, n) such 
that 2k steps lie above the line z = y and 2n-2k steps below it (kK = 0,1, --- , n); then 


Len = (n+ »-(?"), independently of k. A feature of our (non-inductive) proof is the 


explicit 1:1 correspondences which are provided by it, between any two of the n + 1 sets 
of paths considered. Similar methods may be applied to the problem of setting an upper 
bound to the number of non-isomorphic scores in a round robin tournament. 


11. Some Tests for Outliers. C. P. QueseNBERRY, Montana State College, 
AND H. A. Davin, Virginia Polytechnic Institute. 


This paper is concerned with the problem of detecting outlying observations when in 
addition to the normal sample 2; , 72, «+: , %, at hand an independent mean-square esti- 
mate s?2 of the common variance is available. The maximum deviate from the sample mean 
(max — £) divided by the pooled estimate of standard deviation S and the maximum 
absolute deviation from the sample mean max; |z; — #| divided by S are considered as test 
statistics. These statistics are known to have optional power properties for testing against 
alternatives of one outlier on one (specified) side of the sample and for one outlier on either 
side of the sample, respectively. The distributions of the related statistics (x; — #)/S are 
obtained and used in conjunction with the Bonferroni inequalities to obtain significance 
points for the one- and two-sided test statistics. Tables of significance points of the test 





638 ABSTRACTS 


statistics for a = .01 and .05 are given for selected values of n (sample size) and » (the 
degrees of freedom for the independent estimate of variance). Two examples are given to 
illustrate use of the tables. 


12. Estimation of Location and Scale Parameters by Optimally Selected Obser- 
vations. CarL-ErIK SARNDAL, University of North Carolina. (Introduced 
by Bernard G. Greenberg.) 


A large sample of n observations, distributed according to a function of type 
F[(z — a1)/a2], is available. Those k observations 2(n:) < Z(m) < +++ < 2(ny) Which con- 
tain most information with respect to the unknown parameters a; and a are to be selected 
to form linear estimates a* = Sent Griz(n;) » (rT = 1, 2). Another question to be answered 
is to what extent this censoring of the sample can be carried out in order to maintain the 
quality of joint asymptotical efficiency. If \; = n;/(n + 1), it is found that, under certain 
general conditions, the optimum spacings are of the form; = Q[i/(k + 1)] + O(k + 1), 
where Q is a function, the inverse of which is expressible in terms of the density function 
ts derivatives. Using this result, the joint asymptotical efficiency of af and a} is shown to 
be of the form 1 — C(k + 1)-? + o(k + 1)~*, where C is a certain constant. In fact, joint 
asymptotical efficiency is achieved by using only k = k(n) observations, where k(n) — « 
with n in the weakest possible manner. It is to be expected that, also for rather small values 
of k, satisfactory approximations to the optimum spacings can be obtained by taking \; = 
Q[i/(k + 1)]. For the important case of the normal distribution, calculations show that 
these “nearly optimum”’ spacings yield estimates of very high joint asymptotical efficiency 
by utilizing only 10 observations. 


13. On the Jifina Sequential Tolerance Limits Procedure. Sam C. SAUNDERs, 
University of Wisconsin. 


This paper contains a short exposition of the necessary assumptions that the Jifina 
Sequential Tolerance Limits Procedure (Selected Trans. in Math. Stat. and Prob., 1.M.S. & 
A.M.S8., Vol. 1, 1961, pp. 145-156) be generalized to spaces other than the real line. A method 
of obtaining the confidence level for any parameter values is given. A table is computed 
for small values and for large values it is shown that tables of the exponential integral can 
be used. Formulae for the expectation and variance of the random sample size are derived. 
That the random sample size, appropriately scaled by one of the parameters, has an asymp- 
totic distribution as that parameter increases is proved and the La Place transform of this 
distribution is found. Also formulae for the asymptotic mean and variance are found as 
one parameter is increased. 


14. Moments of the Radial Error. Ernest M. Scuever, Space Technology 
Laboratories. 


Let z, and z2 be normally distributed random variables with zero means, variances 
ou and ox , respectively, and covariance oi2 . Define the radial error R by R = (zi + 23). 
Further, let oF = ${ (61. + o22) + [Co — o22)? + 4oi2]}}, o3 = ${ (on + o22) — [(o11 — o22)? + 
4ai2]4}, and k? = (0; — o:)/o; . Then for the moments about the origin up of R, one obtains: 


wap = {[(2ei)*p!]/x4) Tr-0(?)-Hy'IrG + )I/r! and wap = 1-3 +++ Qp + 1[(e/2)8] 


F(—4(2p + 1), 4, 1; k*) where F (a, b, c; z) is the hypergeometric function. Cf., the results 
of Rice (in Wax, Ed., Selected Papers on Noise and Stochastic Processes, Dover, 1954, p. 
238 et. seq.) who treats the case o12 = 0,01; = o22 , F(a) = P (generally not zero), E(z2) = 0. 





ABSTRACTS 639 


15. An Optimal Sequential Accelerated Life Test with Exponential Dependence 
on Stress. GiIpEON ScHwartz, Stanford University. 


Bessler, Chernoff and Marshall present a game-theoretic method for obtaining optimal 
sequential designs for experiments to test whether a device subjected to standard stress 
has an expected lifetime exceeding a specified value. They solve explicitly the case in which 
that function is the inverse of a quadratic. In the present paper, the same method is ap- 
plied to the case where the dependence of the expected lifetime on the stress is exponential. 


16. Tests for Regression Coefficients When a Continuous Sample is Available. 
M. M. Srpprqut, Boulder Laboratories, National Bureau of Standards. 


In continuation of previous studies (these Annals, Vol. 29 (1958), pp. 1251-56; Vol. 31 
(1960), pp. 929-938), tests for regression coefficients are developed when a continuous 
sample is available and the error process is assumed to be stationary. Bivariate Gram- 
Charlier series are used to approximate the joint distribution of the least-squares estimate 
of a regression coefficient and the integral of squared residuals. Finally, approximate ¢ 
and F tests are suggested. The theory is applied to some cases of interest such as analysis 
of variance for one way classification, and polynomial trend. 


17. A Generalization of the Ballot Problem and Its Application in the Theory of 
Queues (Preliminary report). Lasos TaxAcs, Columbia University. 
(Invited paper) 


In the case of single server queueing processes denote by G,(z) the probability that the 
busy period consists of n services and its length is Sz. The author determines G,(z) ex- 
plicitly for types [M, E,, , 1] and [Z,, , M, 1] by using the solution of the ballot problem and 
for types [M, G, 1] and [G, M, 1] by using the following more general theorem: Suppose 
that an urn contains n cards labeled with nonnegative integers whose sum is k S n. All 
the n cards are drawn without replacement from the urn. The probability that for r = 
1, 2, --- , n the sum of the first r numbers drawn is less than r is 1 — k/n. 





NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


Dr. Max Astrachan, Logistics Department, The RAND Corporation, has 
received a part-time appointment to the faculty of the Graduate School of 
Business Administration, University of California at Los Angeles, as Lecturer 
(Professor). 

William 8. Bennett has joined Sylvania’s Reconnaisance Systems Laboratory, 
Mountain View Calif. Formerly an operations analyst at John Hopkins’ Opera- 
tions Research Office, he remains a doctoral candidate in statistics at The 
American University. 

Benjamin Buchbinder has accepted a position with the General Electric 
Company, Special Programs Section, Defense Systems Department, as engi- 
neer-systems analyst. 

Dr. William R. Buckland has resigned from London Transport Executive to 
take charge of the new Statistical Advisory Service of The Economist Intelli- 
gence Unit Ltd., 22, Ryder Street, London, 8.W.1. 

Dr. Maria Castellani has accepted a new position as Professor of Mathematics 
and Chairman of the Mathematics Department, Fairleigh Dickinson University, 
Teomeck, N. J. Dr. Castellani was formerly Chairman of the Mathematics 
Department and Lina Hasq Professor, University of Kansas City, Kansas 
City, Mo. 

Martin Dorff completed his Ph.D. in statistics at Iowa State University in 
June 1960, whereupon he joined the staff of the Department of Mathematics 
at the University of Maine. 

Dr. David B. Duncan has accepted a position as Professor of Statistics in the 
Department of Biostatistics, School of Hygiene and Public Health, Johns Hop- 
kins University, Baltimore, Maryland. Dr. Duncan was formerly a Visiting 
Professor in the Department of Statistics, University of North Carolina. 

Dennis J. G. Farlie is currently on leave of absence from the General Electric 
Company (England) to work at the Bell Telephone Laboratories for one year, 
October 1960—October 1961. 

Dr. J. Hemelrijk, formerly Professor in Statistics at the Technical University, 
Delft, Holland, has been appointed Professor in Mathematical Statistics at the 
University of Amsterdam. He has also been appointed Head of the Statistical 
Department of the Mathematical Centre, 2e Boerhaavestraat 49, Amsterdam 
(0), Holland. 

Mr. Seymour Jablon, Statistician in the Division of Medical Sciences, Na- 
tional Academy of Sciences—National Research Council, is currently in Japan 
as head of the statistics department, Atomic Bomb Casualty Commission, a 
research agency of the NAS-NRC operated in cooperation with the Japanese 
Government. With a staff of U.S. and Japanese statisticians and related medical 

640 





NEWS AND NOTICES 641 


scientists he is devoting himself to the study of late radiation effects of the bombs 
dropped on Hiroshima and Nagasaki. Mr. Jablon and Dr. Gilbert W. Beebe 
are currently rotating this responsibility, Dr. Beebe’s first tour having been 
1958-1960. 

Mr. J. N. K. Rao, an advanced graduate student in Statistics at Iowa State 
University, has been selected as the recipient of the George W. Snedecor Award 
in Statistics, which consists of a cash prize of $25.00, a year’s membership in the 
Institute of Mathematical Statistics and a year’s subscription to the Annals of 
Mathematical Statistics. 


P. V. Rao has resigned his position as lecturer in statistics in the College of 
Science, Nagpur, India and has joined the Department of Mathematics of the 


University of Georgia at Athens, Georgia, on a scholarship awarded by the 
Georgia Rotary Students Fund. 


David Rothman has accepted a position as Member of the Technical Staff at 


Systems Laboratories, a division of Electronic Specialty Co., 5121 San Fernando 
Road, Los Angeles 39, California. 

Robert R. Read is currently a visiting assistant professor in the Depart- 
ment of Statistics at the University of Chicago. 

Esther Seiden, formerly at Northwestern University, Evanston, Illinois has 


been appointed associate professor at Michigan State University, East Lansing, 
Michigan. 


<< —— 


NEW MEMBERS 
The following persons have been elected to membership in the Institute 


Alberda, Willis J., A.B. (Calfin College); Student, Montana State College, Bozeman, 
Montana; RR no. 1, Box 128, Manhattan, Montana. 

Anderson, Donald A., B.A. (Macalester College); Graduate Student, University of Ne- 
braska, Lincoln, Nebraska; 535 North 16th St., Lincoln, Nebraska. 

Antelman, Gordon R., M.A. (University of Minnesota); Research Associate, Harvard Uni- 
versity, Graduate School of Business Administration, Soldiers Field, Boston 63, Mass. 
Arkin, Herbert, Ph.D. (Columbia University) ; Professor, City College of New York. 17 Lex- 

ington Ave., New York, N. Y. 

Arora, Manmohan Singh, M.A. (Delhi School of Economics, India); Research Assistant, 
Department of Economics, University of Minnesota, Minneapolis 14, Minn. 

Bailey, Daniel E., Ph.D. (University of California); Postdoctoral Research Fellow, De- 
partment of Statistics, University of California, Berkeley 4, California. 

Batschelet, Edward, Professor (Extraordinarius), (University of Basle, Switzerland); 
Visiting Professor Department of Mathematics, The Catholic University of America, 
Washington 17, D.C. 

Battenberg, Robert A., Ph.D. (Stanford University); Chief, Mathematical and Statistical 
Analysis, Personnel Laboratory, Wright Air Development Division, USAF, Box 1557, 
Lackland A.F.B., San Antonio, Texas; 2666 W. Summit, San Antonio, Texas. 

Bayo, Francisco, M.S. (University of Michigan); Actuary, Division of the Actuary Social 
Security Administration, Washington 25, D.C. 

Bein, Norman, B.A. (Brooklyn College); Mathematician, Aerojet-General Corp., Dept. 6620, 
P.O. Box 1947, Nimbuz, California. 

Bhattacharyya, Bibhuti Bhushan, Ph.D. (London School of Economics and Political 





642 NEWS AND NOTICES 


Science) ; Visiting Associate Statistician, North Carolina State College, Department of 
Experimental Statistics, Raleigh, N. C. 

Black, Barbara C., B.S., (Florida A and M University) ; Instructor of Mathematics, Florida 
A and M University, Tallahassee, Florida; Box 122, Florida A and M University, Talla- 
hassee, Florida. 

Blake, William Henry, Jr., A.B. (The George Washington University); Student, George 
Washington University; 2/21 H. St. Apt. 404, Washington 7, D.C. 

Bograd, Naomi, A.B. (Bryn Mawr College); Associate Member of Technical Staff, Bell 
Telephone Laboratories, Whippany, New Jersey. 

Brandt, Edward N., Jr., M.D. (University of Oklahoma) ; Intern in Medicine, VA Hospital, 
921 N.E. 13th, Oklahoma City, Okla.; Rt. 4, Box 285, Oklahoma City 11, Oklahoma. 

Branson, Michael, Ph.D. (University of London); Research Mathematician, Carnegie 
Institute of Technology, Schenley Park, Pittsburth 13, Pa.; 4947 Waliingford Street, 
Pittsburgh 13, Pa. 

Breiman, Leo, Ph.D. (University of California); Assistant Professor, University of Cali- 
fornia at Los Angeles, Department of Mathematics, Los Angeles 24, California. 

Brown, Geoffrey John Hatton, Mathematician, Regional Highway Planning Committee, 
499 Pennsylvania Avenue, N.W., Washington 1, D. C.; 1133 24th Street, N.W., Apt. C.., 
Washington 7, D.C. 

Clelland, Richard C., Ph.D. (University of Pennsylvania); Assistant Professor, University 
of Pennsylvania; Department of Statistics, Dietrich Hall, University of Pennsylvania, 
Philadelphia 4, Pennsylvania. 

Cobb, E. Benton, B.Sc. (University of Nebraska); Graduate Student in Mathematics, 
University of Nebraska, Lincoln, Nebraska; 3200 ‘‘R’’ St., Lincoln, Nebraska. 

Colombo, Bernardo, ‘‘Laurea’’ in Economics and Commerce (Catholic University of Milan); 
Professor of Statistics, Instituto Universitario di Venezia Facolt&a di Economia e 
Commercio, Ca Foscari, Venice, Italy. 

Costello, Donald F., M.S. (Notre Dame) ; Teaching Assistant, Department of Mathematics, 
University of Nebraska, Lincoln 8, Nebraska; 1305 S. 13th St., Lincoln, Nebraska. 
Drier, Karen, A.B. (The George Washington University); Graduate Teaching Assistant, 

The George Washington University; 925 25th St., N.W., Apt. 615, Washington 7, D.C. 

DeLisser, Oswald George, B.Sc. (University College of West Indies); Agricultural Officer, 
Division of Economics and Statistics, Jamaica, West Indies, Hope, Kingston 6, Ja- 
maica, W. I.; Department of Statistics, University of Aberdeen, Meston Walk, Aberdeen, 
Scotland. 

Fair, William R., M.S. (Stanford University) ; President, Fair, Isaac and Co., Inc., 156 Mont- 
gomery St., San Francisco, California. 

Friedman, Herman P., M.A. (Brooklyn College); Senior Mathematician, System Develop- 
ment Corp. Paramus, N. J.; 765 Riverside Drive, New York 32, N. Y. 

Fukushima, Kozo, B.S., (Roosevelt University); Research Assistant, Department of Sta- 
tistics, University of North Carolina; 320 Connor Dorm., Chapel Hill, N.C. 

Geisler, Murray A., M.A. (Columbia University) ; Manager, Logistics Systems, Laboratory, 
Rand Corporation, 1700 Main St., Santa Monica, California; Department of Mathemati- 
cal Statistics, Stanford University, Stanford, California. 

Glickstein, Aaron, Ph.D. (Purdue University); Senior Analyst, Mathematics Section, 
Midwest Research Institute, 425 Volker Blud., Kansas City 10, Missouri. 

Goldman, Aaron S., M.S. (Oklahoma State University) ; Statistician, Los Alamos Scientific 
Laboratory, P.O. Box 1668, Los Alamos, New Mezico. 

Goodman, Nathaniel R., Ph.D. Math. (Princeton University); Member of the Technical 
Staff, Space Technology Laboratories, Inc., P.O. Box 95001, Los Angeles 45, California. 

Grizzle, James E., Ph.D. (North Carolina State College); Assistant Professor, Department 
of Biostatistics, School of Public Health, University of North Carolina, Chapel Hill, N.C. 





NEWS AND NOTICES 643 


Harris, Bernice, M.B.A. (New York University); Research Assistant, Yale University 
Medical School, New Haven 11, Conn.; 3 Riverside Drive, New York 28, N.Y. 

Harris, Chester W., Ph.D. (University of Chicago); Professor of Education, University of 
Wisconsin, Madison 6, Wisconsin. 

Harrow, Martin, B. Sc. (McGill University); Assistant Professor of Mathematics, Sir 
George Williams University, Drummond St., Montreal 25, Quebec, Canada. 

Hiorns, Robert W., M.A. (University of Edinburgh) ; Statistician, Medical Research Council, 
Department of Human Anatomy, University of Oxford, Oxford, England. 

Iglehart, Donald L., M.S. (Stanford University); Research Assistant, Statistics Department, 
Stanford University, Stanford, California. 

Jizba, Zdnek V., Ph.D. Geology (University of Wisconsin); Research Geologist, California 
Research Corporation, P.O. Box 446, La Habra, California. 

Kane, Edward James, Ph.D. (Massachusetts Institute of Technology) ; Assistant Professor 
of Economics, Jowa State University, Ames, Iowa. 

Koop, John Clement, Ph.D. (North Carolina State College); Assistant Professor of Experi- 
mental Statistics, North Carolina State College, P.O. Box 5457, Raleigh, N.C. 

Korin, Basil P., M.S. (Stanford University); Mathematical Statistician, Bureau of the 
Census, Washington 25, D. C.; 22 8rd St., S.E., Washington 8, D. C. 

Kraft, Thomas L., M.B.A. (UCLA); Senior Engineer, Statistical Analysis, Phileo-WDL; 
3875 Fabian Way, Palo Alto, California; 787 Sunshine Drive, Los Altos, Calif. 

Lamperti, John W., Ph.D. (California Institute of Technology); Assistant Professor, De- 
partment of Mathematics, Stanford University, Stanford, California. 

Laska, Eugene M., M.S. (New York University) ; Student, New York University; 1 Remsen 
Road, Yonkers, N.Y. 

Lewis, Peter A. W., M.S. (Columbia University) ; Engineer, International Business Machines 
Corp., Watson Laboratories, 612 W. 116 St.,N. Y. 25, N.Y. 

Long, John Marshall, Ed.D. (University of Virginia); Associate Professor and Acting 
Chairman, Department of Mathematics, Norfolk College of William and Mary, Nor- 
folk, Va.; 412 Fairfax Ave., Norfolk, Va. 

Lynch, Cornelius J.. M.A. (Georgetown University); Mathematician, David Taylor Model 
Basin, Carderock, Maryland; Apt. M-636, 1111 Arlington Blvud., Arlington, Va. 

Maddala, G. S., M.Sc. (Bombay University, India); Graduate Student, University of Chi- 
cago; Hitchcock Hall, University of Chicago, Chicago 37, Illinois. 

Mantel, Nathan, M.A. (American University); Head, Experimental Statistics Section, 
National Cancer Institute, Bethesda 14, Maryland. 

Mark, Abraham M., Ph.D. (Cornell University); Professor of Mathematics, Southern 
Illinois University, Carbondale, Illinois; Department of Statistics, University of Cali- 
fornia, Berkeley 4, California. 

Marshall, Jack A., M.S. in Statistics (The University of Chicago); Student, University of 
Chicago, Chicago 37, Illinois; 7427 S. Chappel Ave., Chicago 49, Illinois. 

Menon, Manavashi Vijaya Krishna, Ph.D. (Ohio State University); Senior Associate 
Mathematician, I.B.M. Corporation, General Products Division, San Jose 14, Calif.; 
Department 498, Building 061, I.B.M. Corp., Monterey and Cottle Roads, San Jose 14, 
California. 

Meyer, Donald L.; School of Education, Syracuse University; 1646 S. State St., Syracuse 5, 
Rue 

Miller, Alan John, Ph.D. (Manchester University); Assistant Lecturer, Statistics Depart- 
ment, University, College London, Gower Street, London W.C.1, England. 

Moder, Joseph J., Department of Statistics, Iowa State University, Ames, Iowa. 

Moss, David M., B.A. (The George Washington University); Student Assistant, Depart- 
ment of Statistics, The George Washington University; 1327 Leegate Road, Washington 
12, D.C. 





644 NEWS AND NOTICES 


Mudholkar, Govind Shrikrishana, M.Sc. Statistics, M.Sc. Mathematics (The University 
of Poona, India); Research Assistant, Department of Statistics, University of North 
Carolina, Chapel Hill, N.C. 

Muller, Emil-Roy, B.Sc. Agric. (University of Natal, South Africa) ; Student (1), Lecturer in 
Biometry(2), University of Aberdeen (1), Faculty of Agriculture, University of Natal, 
South Africa (2) (on study leave); Department of Statistics, University of Aberdeen, 
Meston Walk, Old Aberdeen, Scotland. 

Nacif, Ernest, B.A. (UCLA); Quality and Reliability Engineering, Hughes Aircraft Co., 
Segundo, California. 

Narayanan, Parameswaran, M.A. (Madras University, India); Examiner of Trade Marks, 
Trade Marks Registry, Central Government Buildings, Queen’s Road, Bombay; No. 
12, Annapurna Mandir, Adenwalla Road, King’s Circle, Bombay 19, India. 

Norman, Warren T., Ph.D. (University of Minnesota); Assistant Professor of Psychology, 
Department of Psychology, University of Michigan, Ann Arbor, Mich. 

O'Fallon, William M., M.A. (Vanderbilt University); Student, Department of Statistics, 
University of North Carolina, Chapel Hill, N. C.; 223 Connor Dorm., Chapel Hill, N.C. 

de Oliveira, J. Tiago, Doctor in Mathematics (Lisbon University); Assistant Professor, 
Faculty of Sciences of the Lisbon University; Faculdade de Ciencias, R. da Escola Poli- 
tecnica 58, Lisbon 2, Portugal. 

Orkand, Donald S., M.B.A. (New York University); Research Statistician, Operations Re- 
search Inc., 8605 Cameron St., Silver Spring, Md. 

Palmour, Vernon E., M.S. (University of Wyoming); Operations Analyst, Technical Opera- 
tions, Inc., 1485 G. Street, N.W., Room 310, Washington 5, D.C. 

Park, Heebok, B.S. (Seoul National University) ; Student, University of Chicago, Chicago 
37, Ill.; 1009 E. 57th St., Chicago 37, Ill. 

Qureishi, Abdus Salam, M.S. (Patna University, India); Graduate Assistant, Department 
of Mathematics, Case Institute of Technology, Cleveland 6, Ohio. 

Rényi, Alfred, Doctor’s Degree of the University of Szeged; Director, Mathematical Insti- 
tute of the Hungarian Academy of Sciences; Professor of Mathematics, University of 
Budapest; Budapest, VI. Benczur-u. 28. 11.8. Hungary. 

Rich, Judith M., B.A. (Marquette University); Student, Department of Statistics, Univer- 
sity of North Carolina; 321 Kenan Dormitory, Chapel Hill, N.C. 

Roberts, Charles D., B.S. (Georgia Institute of Technology); Student, Department of Sta- 
tistics, University of North Carolina; 405-A Smith Avenue, Chapel Hill, N.C. 

Robinson, Paul D. B.S. (Brooklyn College); Senior Actuarial Assistant, George B. Buck, 
Consulting Actuary, 60 Worth Street, New York, N. Y.; 141 East 19th Street, Brooklyn 26, 
N.Y. 

Rotkin, Israel, E. E. (C.C.N.Y.); Chief, Electromechanical Laboratory, Diamond Ordnance 
Fuze Laboratories, Washington 26, D.C. 

Ruben, Harold, Ph.D. (Imperial College of Science and Technology, London University) ; 
Head, Department of Statistics, The University, Sheffield 10, England. 

Sanders, Paul G., M.S. (Virginia Polytechnic Institute) ; Staff Statistician, Abbott Labora- 
tories, Fourteenth and Sheridan, North Chicago, Illinois. 

Schoeman, Hermanus S., M.Sc. (University of Pretoria); Lecturer, Department of Statistics, 
University of Steilenbosch, Stellenbosch, Cape Province, Union of South Africa. 

Seliskar, John L., A.B. (John Carroll University); Mathematical Statistician, Biometrical 
Studies, U. S. Forest Service, U. S. Department of Agriculture, South Building, Room 
3230, Washington 25, D.C. 

Shumway, Robert H., M.S. (Iowa State University); A-HsO, U.S. Public Health Service, 
Research Branch, Division of Rad. Health, Dept. of Health, Education and Welfare, 
Washington, D. C.; Arlington Towers M-732, Arlington 9, Va. 

Sibuya, Masaaki, Master of Engineering (Tokyo University) ; Research Member, Institute 





NEWS AND NOTICES 645 


of Statistical Mathematics, 1 Azabu Huzimi-tyo, Minato-ku, Tokyo, Japan; 44 Si mi- 
yosi-tyo, Sinzyuku-ku, Tokyo, Japan. 

Smith, J. E. Keith, Ph.D. (University of Michigan) ; Staff Psychologist, Lincoln Laboratory, 
M.1.T., Lexington 73, Mass. 

Sorum, Marilyn, M.A. (University of Minnesota); Graduate Research Psychologist, Uni- 
versity of California Medical Center; Biomechanics Laboratory, 463 U. Hospital, U.C. 
Medical Center, San Francisco 22, California. 

Srivastava, Jagdish Narain, M.Sc. (Lucknow University, India); Student, Department of 
Statistics, University of North Carolina, Chapel Hill, N. C. 

Stewart, Leland T., M. Sc. (Stanford University); Research Engineer, Sylvania Electronic 
Defense Laboratory, Mountain View, California; 287 Edlee Ave., Palo Alto, California. 

Symonds, Gifford H., M.A., (Columbia University); Associate Professor of Management, 
Case Institute of Technology, University Circle, Cleveland 6, Ohio. 

Trumbo, Bruce E., A.B. (Knox College); Student, Department of Statistics, University of 
Chicago, Chicago 37, Illinois; 619 S. Columbia Ave., Springfield, Ill. 

Vail, Richard W., Jr., M.S. (Virginia Polytechnic Institute) ; Statistical Consultant, Aerojet 
General Corporation, Azusa, California; 508-A W. Huntington Drive, Arcadia, Cali- 
fornia. 

Varley, Thomas C., A.B. (The George Washington University); Graduate Teaching Assis- 
tant, The George Washington University, Washington, D. C.; 3489 S. Stafford, Arlington 
6, Virginia. 

Vincze Istva’n, Doctor (Pa’zma’ny Peter University, Budapest); Assistant Director, 
Mathematical Institute of the Hungarian .Academy of Sciences, Budapest, V. Rea’ltanoda 
U. 18-15. Hungary. 

Vind, Karl, Cand. Polit. (University of Copenhagen); Cand. Polit., Institute of Statistics, 
University of Copenhagen, Skd. Pederssdraede 19 I, Copenhagen K, Denmark; Skel- 
h¢jen 77, Herlev, Denmark. 

Viswanathan, Thumbavanam V., M.A, (University of Madras); Senior Statistician, Sta- 
tistical Office, United Nations, New York; 147-23 Charter Road, Jamaica 35, N.Y. 
Warren, William G., M.Sc. (University of New Zealand); Forest Biometrician, Forest Re- 
search Institute, P. B. Whakarewrewa, Rotorua, New Zealand; 113 Connor Hall, Uni- 

versity of North Carolina, Chapel Hill, N.C. 

Winter, Robert F., M.B.A. (New York University; Teaching Fellow in Statistics, New York 
University, 100 Washington Square, New York 3, N. Y.; 47 Plaza Street, Brooklyn 17, 
WoT. 

Wrage, Ernst G., Doctorium Rerum Naturalium (Universitat Freiburg, Germany) ; Wissen- 
schaftlicher Assistent, Institut fiir Augewandte Mathematik U. Mechanik, Deutsche 
Versuchsaustalt fur Luftfahrt, Freiburg i.Br., Hebelstr.27, Germany; Operations Re- 
search Group, Dornier-Werke-GmbH. Friedrichshafen, Bodeusee, Germany. 

Yu, Hi Se, Assistant Professor, Department of Mathematics, College of Liberal Arts and Sci- 
ences, Chungnam University, Taejon, Korea. 

Zakich, Daniel, M.S. (Virginia Polytechnic Institute); Member of the Technical Staff, 
Hughes Aircraft Company, Ground Systems Group, P.O. Box 2097, Fullerton, Calif. ; 
10201 Antigua Street, Anaheim, Calif. 

Zweerus, Hans; Assistant for Mathematical Statistics, Instituut voor Theoretische Bio- 
logie (Leiden University; Telderskade 2, Leiden, The Netherlands. 


ea 


TRAVEL GRANTS FOR ATTENDANCE AT THE INTERNATIONAL 
CONGRESS OF MATHEMATICIANS 


Travel grants will be made to a number of mathematicians who wish to attend 
the International Congress of Mathematicians in Stockholm, on August 15-22, 





646 NEWS AND NOTICES 


1962. It is hoped that funds available through various sources may provide 
travel assistance for a considerable number of mathematicians. 

There will be a greater effort than in the past to give aid to younger people. 
As grants will be made only to those who have filed applications, it is urgent that 
any who wish to receive a grant should fill out and file an application. Younger 
people are urged to file applications so that their cases can be considered. Appli- 
cations can be obtained from the Division of Mathematics, National Academy 
of Sciences, National Research Council, Washington 25, D. C. by requesting an 
application for a travel grant to the 1962 International Congress. 

The deadline for filing of applications is November 1, 1961, and an attempt 
will be made to announce the grants by January 1, 1962. 

Awarding of grants will be made only to those persons whose applications 
have been received, in good order, by November 1. The selection will be made 
by a committee consisting of the regular Committee on Travel Grants of the 
Division of Mathematics of the National Academy of Sciences—National Re- 
search Council enlarged to include representatives of societies affiliated with the 
Division and representatives of various governmental agencies. 


I 


CORRECTED ANNOUNCEMENT 
AMS—IMS TRANSLATION PROGRAM 


The IMS in conjunction with the AMS is sponsoring a series of translations 
of articles in foreign languages, particularly in Russian and Chinese. The first 
volume of 306 pages has now been published. Suggestions for articles of current 
interest are chiefly desired, although suggestions for older articles will also be 
welcome. Suggestions should be sent to the Chairman of the IMS committee 
dealing with these translations: Professor Wassily Hoeffding, Dept. of Statistics, 
University of North Carolina, Chapel Hill, North Carolina. 


rR 


BIOSTATISTICS PROGRAM AT THE JOHNS HOPKINS 
UNIVERSITY 


Beginning next September, the Department of Biostatistics at The Johns 
Hopkins University will offer an expanding program of study and research 
leading to Master of Science and Doctor of Science degrees. The curriculum 
has been modified by increasing the scope of the basic courses in statistical 
theory and statistical methods and by the addition of specialized courses in- 
cluding least squares and regression, stochastic processes, nonparametric 
methods, sampling and survey methods, biological assay, design of experiments 
and digital computer programming. These changes reflect a realization of the 
need for more intensively trained statisticians in the areas of biology, medicine 
and public health, and they are the direct result of the increasingly important 
role played by mathematics and statistics in all areas of scientific research. 

The new program was made possible in part by the recent addition to the 





NEWS AND NOTICES 647 


department of Drs. Allyn W. Kimball and David B. Duncan who together with 
Drs. Helen Abbey, Earl Diamond, John J. Gart and Margaret Martin from the 
permanent staff. In September, 1961, Dr. Norman T. J. Bailey of Oxford Uni- 
versity will join the staff as Visiting Professor and will teach the course in 
stochastic processes. Additional appointments may be announced shortly. 

The department has a limited number of liberal fellowships available, and 
interested students are invited to write to the Chairman, Department of Bio- 
statistics, 615 North Wolfe Street, Baltimore 5, Md. for further information. 


$a 


FELLOWSHIPS FOR MATHEMATICS GRADUATES AT UNIVERSITY 
OF CINCINNATI 


The Division of Preventive Medicine and the Institute of Industrial Health 
of the College of Medicine, with the support of the Public Health Service, are 
instituting a training program in Epidemiology and Biostatistics. As part of 
this program training and experience will be given in Biology and Research to 
bright, young prospective candidates for a Ph.D. degree in Mathematics or 
Statistics. The prospective fellow should have his B.A. or B.S. in Mathematics. 
He should have decided on a department at which he will want to complete his 
graduate work. Above all he must have demonstrated a level of achievement 
that will make him an acceptable candidate for an advanced degree at any 
university. His program here will consist of two parts extending over a period 
of two years: 

First he will continue to work for an advanced degree. His particular program 

will be designed in consultation with the Mathematics department of his 

choice and will emphasize areas of content suggested as pre-requisites. At 
the same time he will prepare himself to take the examination required of 
candidates for the Masters degree in Mathematics at this University. This 
preparation will be done under tutorial arrangements made with the Depart- 
ment of Mathematics. Included in his curriculum will be tutorial readings in 

Advanced Probability and Biostatistics. 

Second the student will receive training in Biology. He will be prepared, by 

tutorial arrangements during the first summer of his residency, for courses in 

the sophomore year in the Medical School. He will take Micro-biology, 

Pharmacology, Industrial Toxicology, selected topics in Industrial Health 

problems as well as tutorials in Quantitative Epidemiology. 

The student will be exposed also to research projects; working with medical 
and other fellows in this program and in the Institute of Industrial Health. 
He will be expected to write a publishable paper in some aspects of Mathematical 
Biology at the end of his two year stay. Arrangements may also be made for 
him to take additional work or gain additional research experience at another 
institution. The candidate will receive invaluable experience for a future career 
in the area of Biology or Mathematics and Statistics. The stipend will depend 
on qualifications and marital status and will range from $3400 to $4400 plus 
tuition. 





648 NEWS AND NOTICES 


For further information write to Dr. Theodor D. Sterling, Department of 
Preventive Medicine, College of Medicine, University of Cincinnati, Cincinnati, 
Ohio. 


EE 


ADVANCED DEGREES IN STATISTICS AT THE UNIVERSITY OF 
NEBRASKA 


The Mathematics Department of the University of Nebraska announces the 
introduction of a curriculum leading to the M.A. or M.S. and Ph.D. degrees in 
Statistics. A bachelor’s degree program in Statistics is also under consideration 
at present. Courses are offered in Statistical Methods, Statistics for Engineers, 
Theory of Probability, Information Theory, Methods of Experimental Design, 
Stochastic Processes, Theory of Games and Statistical Decision Theory, and 
Topics in Probability and Statistics. Additional courses will be added as needed. 
A number of assistantships and fellowships are available to qualified students. 
For information concerning degree requirements and for information concerning 
assistantships, direct inquiries to Professor Bernard Harris, Department of 
Mathematics, University of Nebraska, Lincoln 8, Nebraska. 


—— EE 


SUMMER STATISTICS COURSES AT PURDUE 


There will be three intensive courses in statistics and operations research 
offered at Purdue University this summer. 

A ten-day course on the Mathematical Techniques of Operations Research 
will be offered during June 5-15. It is designed for statisticians, quality control 
analysts, engineers, and other technical personnel in industrial and management 
positions. Emphasis will be placed on the mathematical techniques of operations 
research and the application of these methods to current industrial and military 
problems. Professor Paul Randolph, of Purdue, is the course director, and Dr. 
Albert Madansky, of the RAND Corporation, and Professor Bernard Lindgren 
of the University of Minnesota will be the instructors. 

The course on Design of Experiments, during June 7-17, is for statisticians, 
quality control personnel, engineers, and others concerned with planning, ana- 
lyzing and interpreting the results of industrial experiments; it is designed for 
persons who have had previous statistical training. The staff includes Professor 
Charles R. Hicks of Purdue; Professor Clyde Y. Kramer of Virginia Polytechnic 
Institute, and Professor Gayle McElrath of the University of Minnesota. 

The third course, on Statistical Methods for Advanced Quality Control, is 
scheduled for September 5-15. Offered at Purdue annually since 1947, this course 
is designed for those who have had the equivalent of one of the intensive courses 
in statistical quality control given during and after the War, and who went to 
learn more about the statistical approach to industrial and research problems. 
The instructors will be Professors Irving W. Burr and Hicks of Purdue and 
Professors Cecil C. Craig and McElrath of the University of Michigan. 





NEWS AND NOTICES 649 


NSF OFFERS RESEARCH OPPORTUNITIES TO COLLEGE AND 
SECONDARY SCHOOL TEACHERS 


College and secondary school science teachers will be encouraged to participate 
in scientific research as a result of two groups of National Science Foundation 
grant announced today. The grants are part of a continuing Foundation effort 
to strengthen science education on all levels, and to advance scientific graduate 
education through participation in basic research. 

Grants totaling about $700,000 were made to 41 educational institutions to 
provide research participation programs for 350 college teachers of science. A 
total of $618,000 was granted 47 institutions for programs for 310 high school 
teachers. Both programs will enable teachers to help carry out the research 
projects of a university or college department and to work directly with the 
researchers in charge of the projects. 

The research opportunities will provide a logical step of professional advance- 
ment for teachers with the required training who need the stimulation and 
identity with science that research participation can provide. The grants are 
expected not only to increase the effectiveness of science teaching, but to contri- 
bute to the numbers of active investigators of fundamental scientific problems. 

Interested individuals may obtain further information from: SPE (SPISE), 
National Science Foundation, Washington 25, D. C. 


CR 


UNDERGRADUATES TO PARTICIPATE IN RESEARCH THROUGH 
NSF GRANTS 


Undergraduates will work alongside scientists at more than 250 colleges and 
universities beginning next summer as a result of new National Science Founda- 
tion grants. 

The students will have the opportunity to engage in scientific research either 
as an individual under the supervision of an established scientist, or directly 
with the scientist as a member of a research team. 

The Foundation has made available $3.2 million in 357 grants through its 
Undergraduate Research Participation program to help build the interest of 
superior students in research, to widen their understanding of scientific method, 
and to improve their ability to employ scientific investigative procedures. The 
program is now in its third year. 

The grants, together with 165 awards made last year, will enable research 
participation by a total of 2,400 undergraduates during the summer of 1961 
and about 1,900 during the 1961 academic year. A number of the grants are for 
two years, permitting extension through the 1962-3 academic year. 

Under the grants about 37 per cent of the participants will work in chemistry, 
26 per cent in the biological sciences, 13 per cent in engineering, and 12 per cent 
in physics, with the remainder in astronomy, geology, mathematics, psychology, 
and the quantitative social sciences. 





NEWS AND NOTICES 


IMS FELLOWS—1960 


The following individuals have -ecently been elected as Fellows of the Institute of Mathematical 
Statistics 
R. A. BRADLEY D. M. Grirorp 
T. DALENIUS A. T. JAMES 
C. DERMAN M. V. Jouns, Jr. 
M. Dwass E. Parzen 
I. R. Savace 


Re 


DOCTORAL DISSERTATIONS IN STATISTICS, 1960 


The list of doctoral dissertations in statistics and related fields, usually 
published in the June issue of the Annals of Mathematical Statistics, will be 
published this year in the September issue. 


RR 


PUBLICATIONS RECEIVED 


Annuario Estadistico de Espafia, Edicion Manual, Vol. 35, Presidencia del Gobierno, Insti- 
tuto Nacional de Estadistica, Ferraz 41, Madrid, Spain, 1961. 

Graybill, Franklin A., An Introduction to Linear Statistical Models, Vol. 1, McGraw-Hill 
Book Co., New York, 1961, 463 pp. $12.50. 

Current Projects on Economic and Social Implications of Scientific Research and Development, 
1960, National Science Foundation, 124 pp. (Copies of this manual are available from 
the Superintendent of Documents, U.S. Government Printing Office, Washington 25, 
D.C. for 40¢ a copy.) 

Lieberman, Gerald J. and Donald B. Owen, Tables of the Hypergeometric Probability Distri- 
bution, Stanford University Press, Stanford, 1961. $15.00. 

Marsaglia, George, Tables of the Distribution of Quadratic Forms of Ranks Two and Three, 
Boeing Scientific Research Laboratories, Seattle, Washington, 1960, 61 pp. Free. 

Pillai, K. C. Sreedharan, Statistical Tables for Tests of Multivariate Hypotheses, The Statis- 
tical Center, University of the Philippines, 1961, 46 pp. + vii. 

Salzer, Herbert E. and Kimbro, Genevieve M., Tables for Bivariate Osculatory Interpolation 
Over a Cartesian Grid, Convair-Astronautics, San Diego, California, 1961, 40 pp. 

Salzer, Herbert E. and Roberson, Peggy T., Table of Coefficients for Obtaining the Second 
Derivative Without Differences, Convair-Astronautics, San Diego, California, 1961, 25 


pp. 
Salzer, Herbert E., Shoultz, Dexter C. and Thompson, Elizabeth P., Tables of Osculatory 
Integration Coefficients, Convair-Astronautics, San Diego, California, 1960, 42 pp. 
Wissenschaftliche Tabellen, 6th ed., J. R. Geigy 8.A., Basel, Switzerland, 1960, 742 pp. 





STUDIES IN 
Item Analysis and Prediction 


Edited by Herbert Solomon 


This integrated series of mathematical studies presents many new theo- 
retical developments in both test design and the classification of individ- 
uals on the basis of responses to tests. $8.75 


Contributors 
R. R. Bahadur Edward Paulson 
Albert H. Bowker Howard Raiffa 
Gustav Elfving Rosedith Sitgreaves 
Milton Vernon Johns Herbert Solomon 
Paul F. Lazarsfeld Daniel Teichroew 


At your bookstore 


STANFORD UNIVERSITY PRESS 





ADVERTISING IN 


THE ANNALS of 
MATHEMATICAL STATISTICS 


ADVERTISEMENTS for books, recruitment of professional 
personnel, etc., may now be placed in the Annals of 
Mathematical Statistics. Only full-page and half-page 
advertisements will be accepted. For details about 
costs, deadlines, sizes, and so on, please write to 


Mr. Edgar M. Bisgyer 
Advertising Manager 
American Statistical Assn. 
1757 K Street, N.W. 
Washington 6, D. C. 





PE ely A The first two volumes in the Statistical Research 
Ch Monographs series sponsored by the Institute of 
Wiad Mathematical Statistics and by the University 


of Chicago 
aa ee) f 9 


The Passage Problem for a Stationary Markov Chain 


By J. H. B. Kemperman. Presents systematically a number of 
methods useful in studying the problems of first passage and ab- 
sorption in a Markov chain; in particular, methods for obtaining 
exact formulae for the probabilities under consideration or their 
moments. Numerous illustrations show adequately how each method 
serves as a natural tool for handling a large number of practical 
problems. 


Statistical Inference for Markov Processes 


By Patrick Billingsley. A general mathematical theory for the statis- 
tical problems of determining whether Markov models fit empirical 
data and of estimating any parameters upon which the models 
may depend. The applications which illustrate the mathematical 
results make the book useful to workers in the applied fields as 
well as to mathematicians, statisticians, and graduate students in 
statistics. $4.00 


UNIVERSITY OF CHICAGO PRESS 


5750 Ellis Avenue, Chicago 37, Illinois 








BIOMETRIKA 


Volume 48, Parts 1 and 2 Contents June 1961 


Memoirs: 


KeEnpaALL, M. G. Studies in the history of Lpebebitty and statistics Xi. Daniel Bernoulli on maximum likeli- 
hood. Davin, F. N. & Matiows, C. L. The variance of Spearman’s rho in normal samples. Fie.uer, E. C. 
& Pearson, E. 8. Tests for rank correlation coefficients. If. Durnin, J. Some methods of constructing exact 
tests. Heatrcore, C. R. Preemptive priority queueing. Hasgnat, J. A two- sample aepeeneae eS test. NABEYA, 
8. Absolute and incomplete moments of the multivariate normal distribution. Wurrs, Joun 8. Asymptotic 
expansions for the mean and variance of the serial correlation coefficient. Sranxs, T. H. & Davin, H. A. Sig- 
nificance tests for paired-comparison experiments. Watson, G. 8. Goodness-of- fit tests on a circle. Gontn, 
H. T. The use of orthogonal polynomials of the otis e and negative bionomial frequency functions in curve 
fitting by Aitken’s method. Vernagen, A. M. W. The estimation of regression and error-scale parameters, 
when the joint distribution of the errors is of a continuous form and known apart from a scale parameter. 
Matiows C. L. Latent vectors of random symmetric matrices. Harter, H. Leon. Expected values of normal 
order statistics. Haraut, Franx A. A distribution analogous to the Borel-Tanner. NicHoison, W. L. Occu- 
pancy probability distribution critical points. Okamoto, Masasni & Isum, Goro. Test of independence i in 
intraclass 2 x 2 tables 


Miscellanea: Contributions by M. Atrquiian, D. E. Barton, D. E. Barron anv F. N. Davin, Conary, R. 
Biyts anv Davin W. Hurcuinson. W. J. Ewens, J. Gani, J.C. Gowsgr, M. J. R. Heary anno J.C. Gower, 


M. G. Kenpatt, K. C. 8. Prtar anp Ancetes R. Buenaventura, M. M. Sonput, J. C. Tanner, A. M. 
WALKER. 


Reviews 
Other Books received. 
Corrigenda. 


The subscription, payable in advance, is 54/- (or $8.00) per volume (including postage). Cheques should be made 
payable to Biometrika, crossed ‘‘a/c Biometrika Trust” and sent to the Secretary, Biometrika Office, Univer- 
sity College, London, W.C.1. All foreign cheques must be drawn on a Bank having a London agency. 


Issued by THE BIOMETRIKA OFFICE, University College London. 





Announcing a new series of publications 


SELECTED TRANSLATIONS IN 
MATHEMATICAL STATISTICS AND PROBABILITY 


Volume I 


This volume contains 25 papers. Published for the Institute of 
Mathematical Statistics by the American Mathematical Society 


5% discount to members of IMS and AMS 


306 pages 


Orders for copies of Volume I and standing orders 
for this new series should be sent to the 


AMERICAN MATHEMATICAL SOCIETY 


190 Hope Street, Providence 6, R. I. 





ESTADISTICA 


Journal of the Inter American Statistical Institute 
Vol. XVIII, No. 68 September 1960 
CONTENTS 


El Instituto Interamericano de Estadistica M. A. Teixeira de Freitas 
IASI is Twenty Stuart A. Rice 
Determinant Factors in the Creation of IASI and Bases for its Initial Activities 
Halbert L. Dunn 
En el Vigésimo Aniversario del IASI..................... Francisco de wo 
The Work of the Committee on Improvement of National Statistics (COINS) 
Raymond T. Bowman 
COTA—1950 y los Censos Decenales de América O. Alexander de Moraes 
The Statistical Situation in America Tulo H. Montenegro 
La Situacién Estadistica en América Tulo H. Montenegro 
La Encuesta Social de Port Louis (Mauricio) (traduccién) Leo Silberman 
Tasas de Reemplazo de la Mano de Obra Te en los Paises Centroamericanos 
(traduccién) ... Louis J. Ducoff y Gladys Bowles 
Legal Provisions. International Resolutions Relating to Statistics. Institute Affairs. 
Statistical News. Publications. 


Published quarterly Annual subscription price $3.00 (U. S.) 
INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D. C. 





TRABAJOS DE ESTADISTICA 


Review published by the “Instituto de Investigaciones Estadisticas’”’ of the “Consejo 
Superior de Investigaciones Cientificas’’ Madrid-6, Spain. 


VOL XI. 1.960. Cuad. III. 
CONTENTS 


S. Rios Observaciones sobre la regresién y la estimacidén. 
D. E. Barton and F. N. Davip............ A congregating of worms and woodlice. 
Nico F. LavuBscHER On the stabilization of the Poisson variance. 
NOTAS. 


S. Rios... 


MatemAaticos de ayer y de hoy. 
I. YaNez.. 


Sobre los estimadores suficientes. 


8. Rios -El Centro Militar de preparacién para la investigacién Operativa en 
Francia. 


CRONICA BIBLIOGRAFIA CUESTIONES Y EJERCICIO 


For everything in connection with works, exchanges and subscription write to Pro- 


fessor Sixto Rios, Instituto de Investigaciones Estadisticas, C.S.I.C., Serrano 123, 
Madrid-6, Spain. 


The Review is composed of three fascicles published three times a year (about 


350 pp), and its annual price is 100 pesetas for Spain and 240 pesetas for all 
other countries. 





JOURNAL OF 
THE AMERICAN STATISTICAL ASSOCIATION 


Volume 56 June, 1961 Number 294 
Confidence Curves: An Omnibus Technique For Estimation and — Statistical 

Hypotheses . Allan Birnbaum 
Changes in the Size Distribution of Dividend Income. .... .Edwin.B. Cox 
Note on Curve Fitting with Minimum Deviations by Linear Programming 


Walter D. Fisher 
Bivariate Logistic Distributions E. J. Gumbel 


Partial Correlations in Regression Computations.............. Robert L. Gustafson 
An Analysis of Consistency of Response in Household Surveys 
Carol M. ars and Jean L. Pennock 
Multiple Regression Analysis of a Poisson Process. .... Dale W. Jorgenson 
Factorial Treatments in Rectangular Lattice Designs 
Clyde Y. Kramer and Leroy S. Brenna 
Significance Tests in Discrete Distributions. ... H. O. Lancaster 
Exact and Approximate Distributions for the Wilcoxon Statistic with Ties 
Shirley Y. Lehman 
The Use of Sample Quasi-ranges in — Confidence Intervals for the Population 
Standard Deviation Fred C. Leone, C. W. Topp, and Y. H. Rutenberg 
Randomized Rounded-off Multipliers in Sampling ro 
N. Murthy and V. K. Sethl 
Unbiased Componentwise Ratio Estimation....D. S. "Tiekeis and Chitra Vithayasal 
A Note on Measurement Errors and Detecting Real Differences Eugene Rogot 
A Quarterly Econometric Model of the United States 
Lowell E. Gallaway and Paul E. Smith 
On the Use of Partially Ordered Observations in Measuring the en for a Com- 
plete Order. F. Tate 
The Statistical Work of Oskar Anderson Gerhard Tintner 
A Problem Concerned with Weighting of Distributions Coleridge A. Wilkins 


For further information, please contact: 


AMERICAN STATISTICAL ASSOCIATION 
1757 K Street, N.W. Washington 6, D. C. 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Contents of Series A, Vol. 23—Part 1 


Foreword P. C. Mahalanobis 
Sampling the reference set Sir Ronald A. Fisher 


Simple approximations to the probability integral and P(x?, 1) when both are small 


shoes J. B. 8. Haldane 
Mechanistic model of a random phenomenon 
A study of large sample test criteria through properties of efficient estimates 


. C. Radhakrishna Rao 
A method of fractile graphical analysis ...P. C. Mahalanobis 


On some properties of error area in the fractile graph method K. Takeuchi 


Some limit distributions connected with fractile graphical analysis. .J. Sethuraman 
Some limit theorems in regression theory 


K. R. Parthasarathy and P. K. Bhattacharya 


ANNUAL SuBscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Back Numsers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue. 
Subscriptions and orders for back numbers should be sent to 


STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 





TECHNOMETRICS 


A Journal of Statistics 


for the Physical, Chemical and Engineering Sciences 


CONTENTS 
TECHNOMETRICS, Vol. 3. No. 2, May, 1961 


General Considerations in the Analysis of Spectra G.M. Jenkins; Mathematical Considerations in the Esti- 
mation of Spectra Emanuel Parzen; Discussion, Emphasizing the Connection Between Analysis of Variance 
and Spectrum Analysis John W. Tukey; Some Comments on Spectral Analysis of Time Series N. R. Good. 
man; Comments on the Discussions of Messrs. Tukey and Goodman G. M. Jenkins and Emanuel Parzan; 
Spectral Analysis Combining a Bartlett Window with an Associated Inner Window Thomas H. Wonnacott; 
Frequency Response from Stationary Noise: Two Case Histories N. R. Goodman, S. Katz, B. H. Kramer 
and M. T. Kuo; The Modified Gauss- Newton Method for the Fitting of Non-Linear Regression Functions by 
Least Squares H.O. Hartley;Onthe Possibility of Improving the Mean Useful Life of Items by Eliminating 
Those with Short Lives G. S. Watson and W. R. Wells; The Optimum Allocation of Spare Components in 
System Donald F. Morrison; Use of Tables of Percentage Points of Range and Studentized Range H. Leon 
Harter; Book Review C. Daniel; Statistical Programs for High Speed Computers Notices 


Technometrics is published| quarterly in February,| May,| August, and November. To members of the 
American Statistical Association and the] American Society for Quality Control the rate is $6.00. The 
annual non-member subscription rate is $8.00. Checks should be made payable to Technometrics and ad- 
dressed to Technometrics, Post Office Box 587, Benjamin Franklin Station, Washington 6, D. C. 





PROCEEDINGS OF SYMPOSIA 


APPLIED MATHEMATICS 


These symposia were held under the auspices of the American Mathe- 
matical Society and other interested organizations. The Society itself pub- 
lished the first two volumes. The McGraw-Hill Book Company, Inc., published 
and sold Number 3 through 8. These six volumes have now been transferred 
to the American Mathematical Society by special arrangement with the 
McGraw-Hill Book Company, Inc. Orders should be placed through the 


Society. 


Members of the Society are entitled to the usual 25 % discount 
on all the volumes. 


Volume 1 


NON-LINEAR PROBLEMS IN MECHANICS OF 
CONTINUA, 1949. vii + 219 pp. $5.25 


Volume 2 


ELECTROMAGNETIC THEORY, 1950, iii + 
91 pp. $3.60 


Volume 3 
ELASTICITY, 1950. vi + 233 pp. $6.00 


Volume 4 
FLUID DYNAMICS, 1953. vi + 186 pp. $7.00 


Volume 5 


WAVE MOTION AND VIBRATION THEORY, 
1954. vi + 169 pp. $7.00 


Volume 6 


NUMERICAL ANALYSIS, 1956. vi -+- 303 pp. 
$9.75 


Volume 7 
APPLIED PROBABILITY, 1957. v + 104 pp. 


Volume 8 


CALCULUS OF VARIATIONS AND ITS APPLI- 
CATIONS, 1958. v + 153 pp. $7.50 


Volume 9 
ORBIT THEORY, 1959. v + 195 pp. $7.20 


Volume 10 


COMBINATORIAL ANALYSIS, 1960. vii + 
311 pp. $7.70 


Volume 11 
NUCLEAR REACTOR THEORY (In Preparation) 


Volume 12 


THE STRUCTURE OF LANGUAGE AND ITS 
MATHEMATICAL ASPECTS (in Preparation) 


Order from 


AMERICAN MATHEMATICAL SOCIETY 


190 Hope Street, Providence 6, Rhode Island 















THE INSTITUTE OF MATHEMAT 
Abstract of 


Contributed or Invite: 


Name with title, institution, and address of author: 


Title of Paper: 


Meeting to which paper pertains (Contributed papers by title need not pert 


Check status of paper: Contributed [] Invited [] 
Check method of presentation: In person [] By title [J 


If slide or projection equipment is wanted, describe it clearly. A specific kir 





Journal to which paper will be offered: 






MATICAL STATISTICS 
of 


vited Paper 


t pertain to any meeting; in that case leave this space blank): 


ific kind of equipment cannot be assured. 












Journal to which paper will be offered: 


TWO COPIES of an abstract of every contributed paper for presentation at c 
the time designated in the announcement of the meeting. Meetings are annot 
Mathematical Statistics. The title of a paper not in form for publication shou 


Abstracts for invited papers are optional. 


Anyone who is not a member of the Institute must be introduced by a 
by title. A signed statement from the introducing member should be attached | 


Papers may not be submitted if published in full before the date of the In 
society. 


Only one paper per member may be presented in person at any one me 
The number of papers presented by title is not restricted. 


Abstracts of invited papers at joint sessions with other organizations sho 
tion is arranging for their publication. Contributed Paper sessions are usually 1 
a special announcement will be made about publication of abstracts. 


Abstract blanks will be furnished to members by the Secretary on request. 
order to save time and reduce error. Abstracts may be returned to their aut! 
not give full information. 


at a meeting of the Institute must be sent to the Editor by 
announced by mail and on the back cover of the Annals of 
should end with the words “Preliminary Report.” 


>y a member in order to present a paper, either in person or 
ched to the abstract. 


he Institute meeting or if previously presented to any learned 


e@ meeting. Ten minutes is the normal time allotted to each. 


s should not be submitted to the Editor if the other organiza- 
ally not held jointly with other organizations. When they are, 


quest. Members are urged to use the printed abstract blanks in 
ir authors and publication delayed if they are not clear or do 


















INSTRUCTIONS TO AU’ 


The abstract should state clearly the methods and results of the paper v 
ceed 200 words, or the equivalent, and in general should consist of a single 
of the abstract and should follow the standard format of the Annals of Math 
and formulas should be expressed as simply as possible; for example, form 
zontal sequence. Displayed formulas should be avoided whenever possible. 
double spaced, and in form for immediate publication. Remember to send t 


Indicate marginally in pencil the names of Greek, German, or script lett 
letters and symbols to a minimum. Also indicate names of symbols that r 
capital oh, small oh (when used in formulas) and zero. Distinguish between | 





Author: these columns Author: type abstract bel 
for instructions to 





AUTHORS 


aper with as little detail as possible. Abstracts should not ex- 
ingle paragraph. References should be contained in the body 
Mathematical Statistics. Unusual symbols should be avoided, 
, formula symbols should, whenever feasible, run in a hori- 
sible. Abstracts should be typewritten to the maximum extent, 


end two copies. 


pt letters and those of unusual symbols. Keep the use of such 
that may be ambiguously read, for example one and el, or 
een capital and lower case forms when they look alike. 








THE INSTITUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 
OFFICERS 
President: 
E. L. Lehmann, Department of Statistics, University of California, Berk- 
eley 4, California 
President-Elect: 


A. H. Bowker, Department of Statistics, Stanford University, Stanford, 
California 

Secretary: 
G. E. Nicholson, Jr., Department of Statistics, University of North Car- 
olina, Chapel Hill, North Carolina 

Treasurer: 


Gerald Lieberman, Institute of Mathematical Statistics, Sequoia Hall, 
Stanford University, Stanford, California 


Editor: 


William Kruskal, Department of Statistics, Eckhart Hall, University of 
Chicago, Chicago 37, Illinois 


The purpose of the Institute of Mathematical Statistics is to encourage the 
development, dissemination, and application of mathematical statistics. 

Membership dues including a subscription to the ANNALS OF MATHE- 
MATICAL STATISTICS are $10.00 per year for residents of the United States 
or Canada and $5.00 per year for residents of other countries. There are special 
rates for students and for some other classes of members. Inquiries regarding 
membership in the Institute should be sent to the Secretary of the Institute. 





Contents (continued) 


An Exponential Bound on the Strong Law of Large Numbers for Linear 
Stochastic Processes with Absolutely Convergent Coeffici 


Expected Utility for Queues Servicing Messages with Exponentially De- 
caying Utility Frank A. Hatcut 587 
On the Coding ‘Pastas for the Noiseless Channel 


NOTES 
The Essential Completeness of the Class of er Sequential 
Probability Ratio Tests M. H. De Groor 602 
A Problem in Survival James B. MacQuzen 605 
First Passage Time for a Particular Gaussian Process. ..D. Sumpran 610 
A Note on the Ergodic Theorem of Information Theory. . .K. L. Cuune 612 
Remark Concerning Two-State Semi-Markov Processes. . 
Cyrus Derman 615 
An Example of an Ancillary Statistic and the Combination of Two 
Samples by Bayes’ Theorem D. A. 
Correction Notes 
Abstracts of Papers 
News and Notices 
Publications Received 


Patrick Bruuinesiey 594 


MEETINGS OF THE INSTITUTE 


TENTATIVE SCHEDULE 


ANNUAL MEETING—Seattle, Washington 
June 14-17, 1961 
EASTERN REGIONAL MEETING— 
New York, December 27-30, 1961 


Abstracts should be submitted in duplicate to the Editor, on abstract blanks, 
which can be found in the back of every Annals, beginning with the March 1961 
issue or can be obtained from the IMS Secretary. Abstracts must be received 
at least 50 days before the first day of the meeting at which they are to be pre- 
sented, indicating whether presented by title or in person. (Only one con- 
tributed paper may be given in person at any one meeting.) They may be 
printed prior to the publication of the report of the meeting. Those received 
by April 30 will appear in the September Annals, by July 31 in December, etc. 
Abstracts should be limited to 200 words or the equivalent, and should avoid 
displayed expressions and complicated formulae. They can be accepted from 
non-members of the IMS only if transmitted by members. Abstracts must 
follow the stylistic requests on the abstract blank, or they may be returned. 





See ste 


i 


ood 


rad 


